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ELECTRICAL 


ACTIVITY OF THE NERVOUS SYSTEM: 


I. APPARATUS, RECORDING TECHNIQUES 
AND FIELD OF STUDY* 


Kal JENSEN 
University of Wisconsin 
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In the realm of education, and human be- 
havior generally, the role of the central nerv- 
ous system is admittedly great. The present 
program of research utilizes electrical poten- 
tials of the brain as indicators of activity 
within the central nervous system. It is 
thought that these electrical signs of brain 
activity may have important consequences for 
our understanding and evaluation of human 
behavior. 


THE PROBLEM 


The present program of research, which has 
been under way for some time, involves the 
study of: the conditions under which “nor- 
mal” electrical activity of the brain may be 
secured; the developmental aspects of cortical 
potentials from the longitudinal point of 
view; brain waves under conditions of sleep, 
emotional disturbance, varying physico-chem- 
ical conditions, and auditory, visual, tactile 
and pain stimulation; the location of struc- 
tural abnormalities and pathology of the 
brain by the use of potential waves as signs; 
the origin and nature of the electrical activity 
generated in the brains of the abnormal; the 
relationship between differential patterns of 
cortical potentials and cytoarchitectonic struc- 
ture; and the origin and nature of the neuro- 
physiological correlates of complex forms of 
behavior, such as problem solving, concept 
formation, insight, and learning behavior. 


Il. 


The important investigations of du Bois— 
Reymond (21) (1848) clearly demonstrated 
some of the electrical properties of living 
tissue. His researches shed much light on 
demarcation currents and action currents of 
muscle and nerve, but it was not until 1875 


HIsTORICAL BACKGROUND 


*This program of research was supported in part by a 
eries of grants from the Special Research Fund of the 
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that Caton (17) reported experiments in 
which he told of finding distinct fluctuations 
of currents within the brains of living ani- 
mals. While attempting to investigate local- 
ization points on the brains of rabbits and 
monkeys, he noted distinct electrical activity 
within the brain itself. By placing non- 
polarizable electrodes upon the surface of 
the cerebral cortex and the skull and by 
conducting the resulting current through a 
sensitive galvanometer, he found galvanometer 
deflections which varied with the animal’s 
psychic function and physiological state, in- 
creasing with visual and other forms of stim- 
ulation and disappearing at death. 

Fleischl von Marxow (27) (1890)* carried 
Caton’s work further by showing that through 
peripheral stimulation these deflections were 
increased when the electrodes were placed on, 
or in the vicinity of, Munk’s visual area. He 
was perhaps the first to observe these brain 
oscillations by conducting the potentials 
through the skull of the intact animal. As 
a result of his observations on the brain poten- 
tials of animals, he predicted the possibility 
of studying the various psychic actions of 
man through the media of electrical potential 
changes conducted from the scalp. 

While working on the cerebral cortex of 
dogs, Beck (7) (1890), by placing two elec- 
trodes on the surface of the cortex, showed 
the existence of potential changes which had 
no apparent relationship with either the heart 
beat or respiration, and which were independ- 
ent of the animal’s physical movements. Like 
Caton (17) (1875), he showed that a strong 
current oscillation was set up in the occipital 
lobes if the eyes were stimulated with a bright 
light. 

In collaboration with Cybulski, Beck (8) 
(1892) continued his work with the brains 


* The material presented in this paper had originally been 
deposited by Fleisch] von Marxow with the Imperial Academy 
of Vienna as a sealed manuscript in 1883 
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of monkeys and dogs in an attempt to show 
that the currents in the cortex are self-exist- 
ing and are not transmitted currents. As a 
result of their work, they concluded that 
these electrical phenomena of the cerebral 
cortex correspond to simple psychic condi- 
tions and are not due to irrelevant physio- 
logical functionings of the organism. 

Gotch and Horsley (37) (1891), working 
with cats, rabbits, and monkeys, and using 
Lippmann’s capillary electrometer, failed to 
record currents when the animal was at rest 
but found distinct oscillations with peripheral 
stimulation. These were also found by 
Danilewsky (18) (1891). 

In 1904, Tchiriev (85), who was working 
with these cortical potentials, reached the 
conclusion that the potential changes were 
dependent upon the movement of the blood 
in the brain and therefore could not repre- 
sent the functional state of the central nervous 
system. 

In 1912, Kaufmann( 46), using an improved 
electrical system, Wiedemann’s galvanometer, 
was able to disprove Tchiriev’s (85) (1904) 
point of view. With his more sensitive appa- 
ratus he was able to record these regular, 
spontaneous oscillations from the skull of the 
animal. He was able to verify the existence 
of potential variations with peripheral stim- 
ulation. 

Prawdicz—Neminski (66) (1913), using the 
new string galvanometer, established the in- 
uence of peripheral stimulation and verified 
the results of Beck and Cybulski (8) (1892). 

In 1925, Prawdicz—Neminski (67), using 
non-polarizable electrodes and the large Edel- 
mann string galvanometer, attempted with 
Beck (7) (1890), Danilewsky (18) (1891), 
and Kaufmann (46) (1912) to establish the 
existence of spontaneous fluctuations of cur- 
rent in the cerebral cortex. Working with 
dogs, he made simultaneous records of the 
“electrocerebrogram”, cerebral pulse, and 
blood pressure, and arrived at the conclusion 
that, contrary to Tchiriev’s (85) (1904) 
view, these fluctuations were not the result 
of friction of the blood on the walls of the 
cerebral vessels, but rather that they were re- 
lated to certain psychical processes since they 
disappeared before complete arrest of cere- 
bral circulation. | Prawdicz—Neminski (67) 
(1925) was also able to demonstrate the ex- 
istence of waves of the first and second order, 
appearing 10 to 15 a second and 20 to 32 a 
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second respectively, even when conduct 
were made from the intact skull of 
animal. 

The majority of the authors cited above 
believed these cortical potential oscillations to 
be the expression of the activity of the cere- 
bral cortex of the animal, since they varied 
with changes in cortical function and disap- 
peared if the central nervous system was un- 
der the influence of a narcotic or if the animal 
expired. A distinction was also made between 
the regular existing current which could be 
conducted from the cortex while the animal 
was at rest, and the variations in this current 
produced by peripheral stimulation. These 
latter oscillations were particularly sensitive 
and disappeared with the cooling of the cor- 
tex and from inexplicable causes. 
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In 1910, Berger (9) (1929), using the smal 
Edelmann string galvanometer and non- 
polarizable electrodes in his work with dogs, 
noted regular, minute string oscillations when 
the animal was not under the influence of 
external stimuli, if the electrodes were placed 
in symmetrical positions on the cortex. More- 
over, he was unable to elicit any change in 
these oscillations upon peripheral stimulation 

Some years later, Berger (9) (1929) began 
a series of experiments on dogs, using the 
large Edelmann string galvanometer and a 
modified form of Siemen’s and Halske’s 
double-coil galvanometer. Special precautions 
were taken to prevent cooling and evaporation 
from the cerebral cortex by inserting zin¢ 
plate electrodes into the subdural cavity and 
by filling the trephined points with bone wax. 
Berger was then able to show the existence oi 
regular current oscillations when the elec- 
trodes lay over the right and left hemispheres, 
as well as at two points on the same hemi- 
sphere. He was also able to record simul- 
taneously the ‘‘cerebrogram”’ and the electro- 
cardiogram. In an attempt to show that these 
cerebral oscillations were not due to filling of 
the veins and arteries, and breathing, the 
upper cervical spinal cord was cut in the ex- 
perimental animal under observation. Breath- 
ing ceased, and finally, after a short time, the 
heart beat ceased also. The electroenceph- 
alogram, however, which was conducted from 
both hemispheres of the dog, continued after 
the heart beat had ceased. The brain oscil- 
lations changed only in-so-far-as they became 
more regular. On the basis of these experi- 
ments Berger concluded that the cerebral cur- 


LA coisa are 


padre i aa 














nal 
ion- 
ogs, 
hen 

ot 
ced 
re- 
» in 
ion 
gan 
the 
la 
ke’s 
ons 
‘ion 
Zing 
and 
vax. 
> of 
lec- 
res, 
mi- 
1ul- 
tro- 
1ese 
y of 
the 
ex- 
ith- 
the 
ph- 
rom 
fter 
scil- 
ame 
eri- 
“ur- 


; 


rch, 1938) 


t oscillations could not be merely mechan- 

| consequences of cerebral blood move- 
ents or of breathing behavior. 

This experiment remains perhaps the nicest 
erification of the theory of cortical origin of 
rain potentials brought forward up to this 

point, and was in complete harmony with the 
views held by Beck (7) (1890), Danilewsky 
(18) (1891), Kaufmann (46) (1912), and 
others. 

Berger (9) (1929) was further able to 
verify the observations of Prawdicz—Neminski 
(67) (1925) who had demonstrated the pres- 
ence of two distinct waves, which he called 
first and second order waves, appearing 10 to 
1s times a second and 20 to 32 times a second 
respectively. According to Berger’s results, 
the amplitude of the current oscillations con- 
ducted from the brain surface of dogs reached 
an average of 200 to 600 microvolts for the 
larger go-100 millisecond waves and 130 
microvolts for the shorter and smaller 40—50 
millisecond waves. 

Having demonstrated the presence of spon- 
taneous electrical activity in the brains of 
dogs and monkeys, Berger began his pioneer 
work which resulted in his demonstration of 
the presence of the electroencephalogram, or 
Berger Rhythm, in the brains of human 
beings. 

In 1924, while working with a 17 year old 
youth who had been trepanned palliatively 
above the left cerebral hemisphere for a sus- 
pected tumor, Berger (9) (1929) (after in- 
serting a high resistance platinum and quartz 
wire, i.e. 5200 and 3200 ohms, into his cir- 
cuit) succeeded in receiving regular oscilla- 
tions of the galvanometer strings when both 
electrodes were in the region of the trepanna- 
tion and about 4 c.m. apart. This original 
discovery was made with the small Edelmann 
string galvanometer and no record was pos- 
sible with this apparatus. He was later able, 
with the aid of a Siemens and Halske double- 
coil galvanometer, to verify the original ob- 
servation. He obtained tracings by introduc- 
ing needle electrodes extra-durally through 
the cite of the trephine opening in previously 
operated patients, and also got similar curves 
from normal persons using lead foil electrodes 
applied to the scalp. He found it possible to 
record these regular oscillations, which are 
distinguishable by waves of two character- 
istics, having an average duration of 90 and 
35 milliseconds, the amplitude of the large 
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wave amounting to 150 to 700 microvolts and 
that of the 35 millisecond wave to 20 to 30 
microvolts. 

Berger believed these potential oscillations 
were due to electrical changes within the brain 
of the individual, and held that they were 
independent of other physiological functions. 
The alpha or “Berger” rhythm appears at a 
frequency of about ten a second in normal 
adults, and has been assumed to represent the 
spontaneous electrical function of the resting 
cortex. This rhythm can best be recorded in 
conditions of mental repose when the eyes 
are closed. Sensory (i.e. tactual, auditory, 
and visual) stimulation and mental activity 
tend to diminish or even to abolish the wave. 
The cortical rhythms seem to be profoundly 
affected by certain diseases of organic nature, 
as well as by the effects of various narcotics 
which affect the central nervous system. Dur- 
ing natural sleep the amplitude of the waves 
is diminished. Berger (12) (1932) found the 
frequency of the waves to be partly a func- 
tion of age and also found no well-established 
rhythm in children under one year of age. In 
children of from three to four months he 
found a wave of lower frequency and longer 
duration than in normal adults. This devel- 
opmental trend continued up to the age of 
five years, the upper age limit of his experi- 
mental group, where he found values of 110 
and 120 milliseconds which closely approach 
those of the adult. Lindsley (57) (1936) and 
Smith (72) (1937) also found developmental 
trends in the frequency of the alpha wave 
with an increase up to the age of eight years 
where the adult frequency was found. 

Berger (9) (1929) thought of the electrical 
activity which he studied as emanating from 
the entire cortex, but Adrian and Matthews 
(2) (1934) have held that it originated in the 
occipital lobe of the brain and was closely 
related to visual functions. Still more re- 
cently Jasper and Andrews (43) (1938) have 
presented results tending to confirm Berger’s 
original position. 


There is general agreement as to the fre- 
quency of the alpha waves in adults (10 to 
10.5 per second with an accepted range of 
from 8 to 13 a second), but considerable dis- 
agreement as to the frequency range of the 
beta rhythms. Berger (9) (1929) mentioned 
beta waves with frequencies of from 20 to 25 
per second. A year later (10) (1930) he re- 
ported beta frequencies of from 25 to 33 a 
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second. In 1932 and 1933 Berger (12) (13) 
published a frequency range of from 20 to 
so a second. Gibbs, Davis, and Lennox (34) 
(1935) quote Berger as having found a range 
of frequencies from 50 to 60 per second. 
Jasper and Carmichael (44) (1935) obtained 
an average frequency of 25 for the beta waves, 
but also reported beta frequencies ranging 
from 25 to 50 asecond. Foerster and Alten- 
burger (28) (1935) reported beta waves with 
an average frequency of 33.3 a second. Davis 
and Davis (19) (1936) obtained a range of 
from 18 to 50 a second with an average fre- 
quency of 25 a second for the beta waves. 
Lemere (54) (1936) found a range of from 
18 to 35 a second. Liberson (56) (1936), 
after a survey of the literature, placed the 
average range of the beta waves between 25 
and 40 a second. Jasper and Andrews (43) 
(1938), in a quite recent publication, place 
the frequency range of the beta waves be- 
tween 17 and 30 per second, with an average 
frequency of 25. All experimenters are 
agreed that the magnitude of the beta waves 
is considerably less than that of the alpha 
waves. This means that the difficulty of de- 
tecting and eliminating artifacts is greatly 
increased. 

In a recent publication Berger (15) (1937) 
retains his original position with respect to 
the existence of the beta waves as a separate 
and distinct order of waves, but modifies his 
interpretation somewhat. He now believes 
that beta waves originate somewhere in the 
outer three layers of the cortex and represent 
the psycho-physiological activities of the 
brain. He further believes that the appear- 
ance of these waves during mental activity is 
due to an increase in the amplitude of the 
wave itself and is not merely the result of the 
removal of the alpha rhythm which, when 
present, obscures the beta waves. 

A third phenomenon observed in the elec- 
trical activities recorded from the human 
brain is the appearance of periods of inac- 
tivity in the production of alpha waves. 
These latent periods, which are of the order 
of one second, are less fixed and constant 
and have as yet been little studied. 


After the excellent pioneer experiments of 
Berger had been confirmed by originally 
skeptical Adrian and Matthews (2) (1934), 
by Jasper and Carmichael (44) (1935), and 
by Gibbs and Davis (33) (1935), interest in 
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the human electroencephalogram became very 
great. ; 
Il]. Proor or BRAIN ORIGIN 

Tchiriev (85) (1904), as has already been 
said, believed the potential changes which he 
found in his researches to be dependent upon 
the movements of the blood in the brain. 
Danilewsky (18) (1891), Beck (7) (1890), 
Prawdicz—Neminski (67) (1925), and Berger 
(9) (1929), on the other hand, held that these 
oscillations were due, at least partly, to some 
activity within the brain, independent of cere- 
bral blood movements. Since then, numerous 
control experiments have been conducted in 
an endeavor to rule out the possibility that 
these potential oscillations, recorded from the 
surface of the head, are caused by some or- 
ganic process other than brain activity, and 
to demonstrate their cortical origin. One 
must, of course, always be on the alert for 
artifacts, and it is often not easy to distin- 
guish these with precision from brain 
potentials. 

Because of their form and frequency, the 
potentials of the electroencephalogram can be 
distinguished from muscle action currents. 
Adrian and Matthews (2) (1934) have shown 
that if the electroencephalogram were due to 
a clonus or tremor of the orbital musculature, 
then active and passive movements of the eye 
ball should give corresponding waves. They 
argued against the orbital origin of these po- 
tential changes by pointing out that the ex- 
ternal eye muscles are deeply buried in the 
orbit and that their action currents could have 
but little effect on electrodes placed on the 
scalp. They presented experimental evidence 
to show that neither active movements of the 
eyeball, produced by looking at the spokes 
of a revolving wheel, nor passive movements 
produced by the oscillations of the eyeball, 
when in contact with a clockwork driven rod, 
give a corresponding potential wave. They 
further presented experimental evidence to 
show that movements of the head and neck, 
and wrinkling of the forehead and scalp, pro- 
duce electromyograms, but do not alter the 
electroencephalogram in any way. 

If the potentials of the electroencephalo- 
gram were due to clonus or tremor of the 
orbital musculature, then the potential gra- 
dient would be at a maximum in the neigh- 
borhood of the eye; a conclusion which is not 
verified by the data. (Adrian and Matthews 
(2) (1934)) If the potentials were of muscle 
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rin, they should be greatest on the skin 

face, but Berger (9) (1929), (12) (1932); 

rian and Matthews (2)(1934); Jasper and 
(armichael (44) (1935); and Jasper and 

irews (42) (1936) have shown that the 
amplitude of the oscillations increases when 
the electrodes are placed directly in contact 
with the cortex, on the periosteum of the 
skull, or over a trephine opening in the skull. 

rhe potentials due to scalp movements, eye 
movements, eye blinks, head movements, and 
arrectores pilorum contractions were carefully 
ruled out by Berger (12) (1932), and have 
been shown to be clearly distinguishable from 
the regular alpha and beta waves, in fre- 
quency as well as in form and amplitude, by 
Jasper and Andrews (42) (1936). 

Adrian and Matthews (2) (1934) have fur- 
ther shown that the wave could not be due 
to retinal potentials since electrodes placed 
on the scalp could not pick up even the poten- 
tial changes caused by illuminating the eye 
suddenly with a bright light. 

Simultaneous electrocardiograms and elec- 
troencephalograms show no apparent rela- 
tionship between heartbeat and alpha and 
beta waves. Berger (9) (1929), (12) (1932) 
was able to show that even a momentary ar- 
rest of the heartbeat did not produce appre- 
ciable effect on the electroencephalogram of a 
dog, and that the brain rhythm may continue 
even after the heart has ceased to beat. 

Respiration curves are not related to the 
electroencephalogram as has been pointed out 
by Berger (9) (1929) and Lindsley and Ru- 
benstein (58) (1937). Simultaneous record- 
ing of the cerebral plethysmogram and the 
electroencephalogram by Berger (11) (1931) 
also showed that there was no relationship 
between brain pulse or volume change and 
the electroencephalogram. 

According to Berger (11) (1931), the elec- 
troencephalogram during normal sleep ap- 
pears to be ciminished in amplitude with no 
apparent change in frequency, although 
Loomis, Harvey, and Hobart (59) (60) 
(1935) have shown that the changes in the 
electroencephalogram during sleep are more 
complex than Berger held. 

The alpha wave of the human electroen- 
cephalogram is augmented in amplitude after 
a cocaine injection and decreases after a large 
dose of scopolamine. In like manner, it de- 
creases in deep general anesthesia, just as it 
increases in height during excitation periods. 
(Berger (11) (1931)) The effect of the bar- 
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biturates is quite different, giving an appar- 
ent increase in magnitude, and a grouping of 
alpha waves with some frequency changes. 
(Berger (9) (1929), Adrian and Matthews 
(2) (1934)) 

An altered electroencephalogram is re- 
corded when the electrodes are placed over 
abnormal brain tissure if the cortex is affected. 
(Foerster and Altenburger (28) (1935)) 
Under the influence of pressure on the brain 
from tumors, cerebri hydrocephialus, and 
intra-cranial bleeding, the electroencephalo- 
gram undergoes a change in which the alpha 
waves are lengthened. (Berger (11) (1931), 
Walter (86) (1936)) 

In the unconsciousness of the epileptic fit, 
and in deep narcosis, the alpha waves are 
markedly modified or may be entirely lacking. 
(Berger (11) (1931), (12) (1932); Gibbs, 
Davis, and Lennox (34) (1935); Gibbs, Len- 
nox, and Gibbs (35) (1936), (36) (1936)) 

As another bit of evidence to establish the 
brain origin of the observed electrical activ- 
ity, Berger (11) (1931) has shown that the 
characteristic electroencephalogram can_ be 
traced from the cortex of the brain but not 
from the brain stem. Recently, however, 
Spiegel (74) (1937) has been able to record 
from the thalamic nuclei of the thalamus 
curves which correspond closely to the alpha 
and beta waves found by Berger in his elec- 
troencephalograms. 

It has also been experimentally shown that 
concentration of attention, as in solving arith- 
metical problems, tends to diminish or to 
abolish the alpha waves of the electroenceph- 
alogram. These researches by Berger (11) 
(1931), (15) (1937); Adrian and Matthews 
(2) (1934); Foerster and Altenberger (28) 
(1935); and Rohracher (69) (1935) tend to 
show the close relationship between the elec- 
trical activity of the cortex and psychic 
function. 

Various forms of sensory stimulation, ac- 
cording to Berger (11) (1931), Jasper and 
Carmichael (44) (1935), and Travis and 
Gottlober (80) (1936), may tend to diminish 
or abolish the alpha waves, after a latency 
period of from 0.2 to 0.4 seconds. Loomis, 
Harvey, and Hobart (59) (60) (1935) have 
shown that sensory stimulation of the sleep- 
ing subject may also cause bursts of alpha 
potentials without awakening him. 

Dusser de Barenne and McCulloch (24) 
(1936) have produced evidence of a different 
nature to show that the electroencephalogram 
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is of cortical origin. Working with the ex- 
posed cortex of animals, they found that 
thermo-coagulation at 80° C. for 5 seconds 
killed the entire cortical thickness, and imme- 
diately and completely abolished all charac- 
teristic action potentials. 


IV. RECORDING TECHNIQUES 

A. Apparatus 

The electrical potentials of cerebral origin 
recorded from the surface of the skull vary 
from a few microvolts to about 1000 micro- 
volts. Therefore, any galvanometer or other 
apparatus designed to register these cortical 
potentials must be extremely sensitive to be 
able to pick up and record the minute poten- 
tial variations. Not only must the apparatus 
be extremely sensitive, but it should be 
capable of faithful reproduction of the form 
of the potential variations involved. Since 
the duration of these potentials varies within 
very wide limits, the apparatus should be 
equally sensitive to potential variations from 
one to at least one hundred per second. The 
frequencies of the waves characteristic of the 
human electroencephalogram vary from 2 to 
13 a second for the alpha (taking into con- 
sideration young children, special experimen- 
tal conditions, and certain pathological cases), 
and from 17 to 50 a second for the beta wave. 
Consequently, unless one wishes to deliber- 
ately exclude all waves above and below a 
specific limit, as may be the case under cer- 
tain circumstances, the amplification and re- 
cording system must have a sensitivity suffi- 
ciently large to cover this frequency range. 


The earlier researches of Berger (9) (1929) 
were conducted with the Edelmann String 
Galvanometer, and later, with a Siemens and 
Halske Double Coil Galvanometer. 

With the introduction of the use of the 
electron-tube amplifier for the study of brain 
rhythms, an extremely sensitive and useful 
tool was made available for the amplification 
of these minute potential oscillations. Since 
then amplifiers, suitable for the magnification 
of these brain potentials up to a point where 
even quite minute potential variations can be 
studied, have been designed and built. Con- 
denser-coupled amplifiers are generally used, 
but for some special purposes a direct-coupled 
amplifier has been found advantageous. For 
simultaneous recording from different brain 
areas a balanced input amplifier is necessary. 
Among the amplifier systems which have been 
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proposed are those of Scheminsky (70) 
(1928); Matthews (63) (1928); Bartley and 
Newman (6) (1930); Fessard (26) (1932): 
Matthews (64) (1934); Garceau and Davis 
(30)(1934); Spiegel (73) (1934); Jasper and 
Andrews (42) (1936); Koopman and Hoe- 
landt (47) (1936); and Huddleston, White- 
head, and Moritz (40) (1936). 

These amplifiers have been used in con- 
junction with oscillograph systems differing 
from one another in various ways, particu- 
larly as to the method of recording and as to 
their ability to follow various frequencies. 
The Matthews optical type oscillograph has 
been used by Adrian and Matthews (2) (3) 
(1934), Wang (87) (1934), Adrian and 
Yamagiwa (4) (1935), and _ Lemere (54) 
(1936). Kreezer (52) (1936); Jasper and 
Andrews (42) (1936); Travis and Gottlober 
(81) (1937); Travis, Knott and Griffith (84) 
(1937); and Gottlober (38) (1938) have all 
used the Westinghouse Oscillograph. 

In many ways the cathode ray oscillograph 
is the ideal recording instrument since the 
stream of electrons which it uses has neither 
appreciable mass nor damping, and it permits 
faithful registration of frequencies up to at 
least 1,000,000 cycles per second. Gasser 
and Erlanger (32) (1922) pioneered the use 
of this instrument in physiological work, but 
only recently has it come into use for the 
study of the electrical activity of the cerebral 
cortex. (Dusser de Barenne and McCulloch 
(24) (1936), Walter (86)(1936), and Blake 
(16) (1937)) Among those who have de- 
scribed setups for utilizing this potent instru- 
ment are Schmitz (71) (1933); Garceau and 
Davis (30) (1934); Koopman and Hoelandt 
(47) (1936); McCulloch and Wendt (62) 
(1936); Huddleston, Whitehead, and Moritz 
(40) (1936); and Gans (29) (1937). 

Partly to overcome the expense of photo- 
graphically recording the oscillations of the 
moving spot on the cathode ray tube, various 
ink-writing oscillographs have been developed. 
(Toennies (76) (1932), (77) (1933); Adrian 
and Matthews (2) (1934); Garceau and Davis 
(31) (1935); Loomis, Harvey and Hobart 
(59) (1935); and Offmer and Gerard (65) 
(1936)) These oscillographs have a maxi- 
mum frequency of about 40 to 60 a second, 
but have an advantage in that they are rela- 
tively inexpensive to operate and can be read 
instantaneously. 

Some other experimenters have employed 
loud-speakers as an added means of follow- 
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the potential oscillations of the electro- 
(Adrian and Matthews (2) 
24); Garceau and Davis (30) (1934): 


and Huddleston, Whitehead, and Moritz (40) 


»)) 
B. Electrodes 

Several kinds of electrodes have been em- 
ployed for the conduction of electrical poten- 
tions from the cortex and the surface of the 
skull or scalp by workers in this field. 

Berger (9)(1929) used fresh amalgamated 
zinc plates about 12 mm. by 4 mm., the four 
corners of which were rounded off in order to 
prevent injury. A well-insulated wire was 
soldered to the plates, which were inserted 
through a slit in the dura next to the surface 
of the cortex of the experimental animal. In 
subsequent work with human beings, Berger 
(9)(1929) used steel needle electrodes which 
were zincified and insulated up to the point 
with a coat of lacquer. Funnel electrodes 
were also used by Berger (9) (1929), but, due 
to the danger involved in the use of the zinc 
sulfate solution, they were employed only for 
standard records for purposes of comparison. 

Metal electrodes of round copper, plati- 
num, or silver plates wrapped in a somewhat 
larger piece of flannel soaked in a twenty per- 
cent sodium chloride solution, were employed 
by Berger (9)(1929). Jasper and Carmichael 
(44) (1935) and Lemere (54) (1936) used 
silver discs, 1 to 2 cm. in diameter, covered 
with flannel soaked in sodium chloride solu- 
tion. Adrian and Matthews (2)(1934) em- 
ployed electrodes analogous to the plate elec- 
trodes of Berger (9) (1929) but of smaller 
diameter. Lead foil electrodes placed be- 
tween two layers of flannel soaked in 20% 
sodium chloride solution were employed by 
Berger (9)(1929). These metal electrodes 
were generally applied to the clean and hair- 
less scalp, and were held in place by thin 
rubber ribbons, which also prevented drying 
of the flannel pads during the course of the 
experiment. Davis and Davis (19)(1936) 
held their plate electrodes in place with San- 
born’s Electrode paste and collodion. 

Non-polarizable, — silver-silver, chloride 
needle electrodes, well-insulated up to the 
point by a coat of lacquer, have been used by 
Berger (10)(1930); Dusser de Barenne and 
McCulloch (24) (1936); Jasper and Car- 


michael (44)(1935); Travis and Knott (82) 
(1936); and Travis, Knott, and Griffith (84) 
(1937). 


The silver-silver chloride wire was 
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prepared according to the method of Stadie, 
O’Brien, and Laug (75) (1931). 

Concentric needle electrodes, made by pass- 
ing a small wire down the shaft of a hypo- 
dermic needle, have been used by Wang (87) 
(1934); Wang and Lu (88)(1936):; Gibbs, 
Lennox, and Gibbs (35) (36)(1936); and by 
Adrian and Bronk (1)(1929). McCulloch 
and Dusser de Barenne (61)(1936) used a 
concentric silver-silver chloride agar electrode 
of 4 mm. internal and 6 mm. external 
diameter. 

Walter (86)(1936) used silver-silver chlo- 
ride pad electrodes held in place by a special 
cap and moistened with salt solution. For 
work with the exposed brain he used sterilized 
silver-silver chloride wick electrodes filled 
with sterile normal saline. 

In working with epileptic subjects, Gibbs, 
Davis, and Lennox (34)(1935) found a dif- 
fused crown electrode best, since it eliminated 
the possibility of injury from needle elec- 
trodes as a result of violent movements on 
the part of the subjects. 

Kreezer (52)(1936) employed a small coil 
of silver wire attached to a piece of rubber 
sponge which was held on the head by means 
of an elastic band. Contact with the skin 
was made with saline electrode paste. 

Dietsch (20) (1932) believed silver-silver 
chloride needles subject to polarization and 
employed silver plates, about 5 mm. in 
diameter, covered with spongy platinum. 

Hoagland, Rubin and Cameron (39) 
(1937), in their work with schizophrenic 
cases, used electrodes made from small lead 
pellets, about 2—3 mm. in diameter, cemented 
to the scalp with collodion, and making con- 
tact with the scalp through a salt electrode 
paste. 

Adrian and Yamagiwa (4)(1935) have de- 
veloped electrodes consisting of a small coil 
of silver wire, coated with silver chloride con- 
tained in a small glass tube filled with gelatin 
jelly made up with saline and plugged with 
a bit of absorbent cotton. The glass tube 
was held in a rectangular slab of rlibber which 
was bandaged to the surface of the head. 

Jasper and Andrews (42)(1936) used elec- 
trodes similar to those of Adrian and 
Yamagiwa (4) (1935), but introduced the 
silver-silver chloride wire into a glass T-tube 
filled with 10% sodium chloride solution, the 
open end of which was stopped with a bit of 
absorbent cotton. The closed end of the 
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T-tube was passed through a sponge rubber 
block, and was held in place on the head by 
an elastic band. Contact with the scalp was 
made with the moist cotton of the stopped-up 
end of the T-tube. 


More recently, Jasper and Andrews (43) 
(1938) have abandoned this type of electrode 
in favor of a simpler kind. The new elec- 
trode consists of a small, hat-shaped object 
made of chlorided silver with felt-covered 
brims. These electrodes have an_ inside 
diameter of 5 mm. and are fixed to the head 
with collodion and electrode jelly. In gen- 
eral, different types of electrodes have been 
employed by the various workers with more 
or less success in specific cases, while working 
with different subjects, and using various 
amplifier systems. 


Special problems, such as localization on 
the exposed cortex, work on young children, 
or on violent epileptic subjects, obviously re- 
quire special electrodes. For general pur- 
poses, Jasper and Andrews (42)(1936) list 
the following as desirable characteristics of 
an electrode: First, the resistance of the 
electrode and the skin contact should be as 
low as possible. Second, contact with the skin 
should remain constant throughout the course 
of the experiment, and no potential disturb- 
ances should arise from this contact. Third, 
the electrodes should also permit convenient 
and efficient attachment to any point on the 
head surface, so that they will not be dis- 
turbed by head movements. Finally, they 
should be comfortable for the individual to 
whom they are attached. 


Jasper and Andrews (42)(1936) seem to 
favor surface electrodes in preference to needle 
electrodes inserted through the skin to the 
periosteum, because they are more convenient 
and comfortable, and because no anesthesia or 
asepsis is necessary. Their records, taken from 
a pair of surface electrodes simultaneously 
with records from needle electrodes, show no 
qualitative differences between the two meth- 
ods of recording except that in some records 
from the needle electrodes the contact arti- 
facts have a greater amplitude. If the elec- 
trodes are brought together up to a distance 
of 2 cm. from each other, the needle electrodes 
may pick up from 25% to 30% more poten- 
tial than the surface electrodes directly above 
them. This difference is much less if the 
electrodes are farther apart. 
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Berger (12) (1932), Kornmueller (40) 
(1933), and Jasper and Carmichael (44) 
(1935) have also shown that simultaneous 
records taken from a pair of needle electrodes, 
inserted through the scalp to the periosteum, 
and a pair of surface electrodes on the scalp 
directly above, are practically identical in 
form, although the potentials picked up by 
the needle electrodes are generally slightly 
larger. : 

Greater stability and freedom from contact 
artifacts, such as arise from cut tissue, are 
obtained from surface electrodes. Jasper and 
Andrews (42)(1936) further found that the 
resistance between needle electrodes inserted 
through the skin is usually of the same order 
of magnitude as that of the surface electrodes, 


C. Placement of Electrodes 


Two distinct methods of electrode place- 
ment, the bipolar and the monopolar, have 
been used in the study of the electrical activ- 
ity of the cortex. The pioneer researches of 
Berger (9)(1929), (10)(1930), (11) (1931), 
dealing with human subjects, were conducted 
upon trephined individuals, and the needle 
electrodes were inserted through the scalp 
within the region of the cite of the trephine 
openings at points at least 15 mm. apart. 
Berger believed that the entire cortex was 
equally active, and hence both electrodes had 
to be active as applied to the source of the 
potential activity. Later Berger (14) (1935), 
in recording transcortical potentials from nor- 
mal human subjects, placed the electrodes at 
opposite extremities of the skull. Most fre- 
quently he placed one electrode at the level 
of the frontal bone and the other on the con- 
tralateral occiput, as did also Lemere (54) 
(1936). Jasper and Carmichael (44) (1935) 
and Jasper and Andrews (42)(1936), (43) 
(1938) used ipsolateral and contralateral 
placements. 

After the work of Adrian and Matthews 
(2)(1934), the electrodes were placed most 
frequently a short distance apart at any 
determined level of the skull. 

Loomis, Harvey, and Hobart (59) (1935) 
placed their electrodes on the high forehead 
and crown of the head. In later recordings 
from double amplifiers, they placed electrodes 
in the midline on the forehead, the crown, 
and the occiput, the amplifiers being con- 
nected between forehead and crown electrodes 
and between crown and occiput electrodes. 
Kreezer (52)(1936), using a similar double 
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olifier arrangement, placed his electrodes 
. inch to the right of the median plane, 
the right occipital area, right motor area, 
| the anterior part of the frontal area. 


[he unipolar method owes its origin, in 
part, to the work of Adrian and Matthews 

}(1934) who were led to the conclusion 
that the alpha wave originated in the occipital 
lobes, even though it could be picked up from 
across various parts of the head. Their scheme 
indicates a single source of oscillating poten- 
tials located in the occipital region. They 
concluded that the position of one of the 
electrodes was of little importance (they 
called it an “indifferent” electrode), as long 
as the active electrode was placed on the 
occipital portion of the head. 


Kornmueller (48) (49) (1933), (50) (51) 
(1935); Toennies (76) (1932), (77) (78) 
(1933), (79) (1935); Dusser de Barenne and 
McCulloch (23) (1936); and Foerster and 
Altenburger (28)(1935) have used this prin- 
ciple of the “indifferent” electrode in their 
work on animals. These experimenters ap- 
plied the active electrode to various points on 
the skull or cortex and placed the “indiffer- 
ent” electrode on the eye or ear of the animal. 


Davis and Davis (19) (1936); Adrian and 
Matthews (2) (1934); Travis and Gottlober 
(80) (1936), (81)(1937); Travis and Knott 
(82)(1936); Durup and Fessard (22) (1936); 
and Gibbs, Lennox, and Gibbs (35) (36) 
(1936) all used monopolar electrode place- 
ment in their experiments with human beings 
and also spoke of “active” and “inactive” 
electrodes. The inactive, indifferent, or 
ground electrode is most often placed on the 
ear lobe or neck of the subject, and the poten- 
tial activity is assumed to originate in the 
immediate vicinity of the active electrode. 


Davis and Davis (19)(1936) placed their 
active electrode on the top of the head at a 
point just above the occipital protuberance 
on the midline, a position corresponding 
roughly to the motor and visual cortex, the 
reference electrode being on the left ear of 
the subject. 


Travis and Gottlober (80) (1936), (81) 
(1937) placed their active electrode over the 
left occipital area, as did Travis and Knott 
(83)(1937) and Durup and Fessard (22) 
(1936). Travis and Gottlober (80) (1936) 
chose the right motor area. All, however, 
used the lobe of the left ear as the location 
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for their reference, or “indifferent’’, electrode. 
Bartley and Bishop (5) (1933) reached the 
conclusion that no truly “indifferent” lead 
was possible. Jasper and Andrews (42) 
(1936) also criticized the notion of “indif- 
ferent” or ground electrodes. They held that 
when electrodes are applied to the body the 
potential changes led off are always due to 
the sum total of the e.m.f. producing activities 
occurring between the electrodes, and also 
that unless one assumes that the entire brain, 
except the occipital lobe, is electrically dead, 
it seems improbable that a truly “indifferent” 
electrode can be placed on the head. 

In attempting to localize human cortical 
potentials through the skull, Jasper and 
Andrews (42) (1936) found bipolar leads 10 
to 20 mm. apart to be somewhat better than 
monopolar leads. Rheinberger and Jasper 
(68) (1937) reported better differentiations 
of simultaneous electroencephalograms from 
different brain areas by the bipolar method 
of recording. In a still later publication, 
Jasper and Andrews (43)(1938) used the 
diffused lead taken from the ear lobe as a 
check, but maintained that, except under spe- 
cial conditions, the diffused lead technique 
did not give as good localizations. These 
same authors have also shown that it is pos- 
sible to work out standard placements for 
bipolar electrodes which will permit a high 
degree of differentiation between the various 
regions of the brain. 

After a general review of the subject Jasper 
(41) (1937) concluded that: “Both the mono- 
polar and bipolar methods have their advan- 
tages and disadvantages. The selection of 
one method or the other should be determined 
by the purpose of the particular experiment”. 
(p. 420) 


V. DESCRIPTION OF THE APPARATUS AND 
RECORDING TECHNIQUES USED IN THE 
CHILD DEVELOPMENT LABORATORY OF THE 
UNIVERSITY OF WISCONSIN 


Figure 1 presents the floor plan of the Wis- 
consin laboratory, while figure 2 shows, in a 
schematic fashion, the interrelationships of 
the various items of apparatus. Figure 3 
shows a corner of the shielded room in which 
our main low-frequency amplifiers are located. 
This room is 9% feet long, 7/2 feet wide, and 

Y4 feet high. The walls and ceiling are lined 
with fine mesh copper screen, the floor is 
covered with solid copper sheeting, all joints 
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of which are soldered together, and the entire 
setup is grounded at one point. All items 
of equipment in this room are battery oper- 
ated and communication with the outside is 
effected by means of a pneumatic switching 
arrangement. 


The main low-frequency amplifiers* em- 
ployed in this laboratory (figures 3, 4, 5, 6, 
and 7) are three stage push-pull high gain 
amplifiers, all tubes pentodes, and battery 
powered, which make possible a_ voltage 
amplification in twenty million, 
and are capable of operating at a_ noise 
level of slightly less than 2 microvolts in the 
input. With preparations of an ordinary 
value of resistance, one, and even half micro- 
volt signals, can be distinguished. The out- 
put is suitable for operation of: the cathode 
ray tubes, the single stage consisting of a 
single 6A6 in push-pull connection, A.C. or 
battery powered, and the power amplifier, 
which in turn feeds the dynamic speaker and 
the ink recorders. 


excess of 


In the design and construction of these 
amplifiers great pains were taken to assure 
adequate response to frequencies between 1 
and 100 cycles per second in order to insure 
maximum usefulness in the study of the elec- 
troencephalogram. The amplifiers will, how- 
ever, pass signals with somewhat less gain up 
to 10,000 cycles per second. 


The relative impossibility of properly bal- 
ancing push-pull stages with commercially 
available components led to the use of a 
single tube in the second stage with the plate 
of the lower input push-pull tube coupled to 
ground. The fact that one-half of the signal 
is thrown away is more than compensated for 
by the diminution of difficulties which would 
otherwise be encountered. The purpose of 
the adjustable condenser coupling by means 
of the tap switch between the first and second 
stage is to eliminate or control 60 cycle 
interference and low-frequency oscillation 
if either cause trouble. With tap switch 
to smaller condensers, the time constant is 
lowered and the circuit is less sensitive to low 

* The original amplifiers used in this research were designed 
and built by Mr. Lovett Garceau, of the Electro-Medical Lab- 
oratory, Inc.. of Holliston, Massachusetts. The circuit was 
dated January 30, 1936. Since then Mr. Edwin Bernet has 


furnished valuable technical aid, and has made changes in 
design and construction to keep the amplifiers up to date. 
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frequencies. The use of five-inch cathode ray 
tubes in the recording unit requires consid- 
erable plate voltage which necessitates push- 
pull in the last stage. This is done by a 
simple and standard phase inversion scheme 
evident from the diagram. (figure 6) The 
particular scheme of phase inversion used is 
not linear at higher frequencies, but is quite 
satisfactory over the range covered in the 
present experiments. 


An adjustable potentiometer makes it pos- 
sible to set the amplitude of the phase invert- 
ing circuit so that the signals in each half of 
the third stage are balanced. A special cal- 
ibrating unit makes it possible to check on 
the performance of the amplifiers at all times. 
As part of the calibrating equipment a Gen- 
eral Radio Type 377-B Low-Frequency Oscil- 
lator, with a frequency range of from 1o to 
70,000 cycles, is used. This unit also fur- 
nishes time lines in connection with the 
photographic recorders. 


The amplifiers used in this research incor- 
porate such conveniences as very simple 
switching mechanisms supplied solely by the 
automatic action of the plugs in the jacks, 
permitting the use of either grounded or bal- 
anced input circuits. The balanced input cir- 
cuit used is a recent development. Its pur- 
pose is two-fold: (1) to allow simultaneous 
operation of several amplifier channels on a 
single subject, and (2) to permit interference- 
free operation when the subject is not shielded 
from induction caused by power lines. (This 
is indispensable in field work). 


Workers in this field have had considerable 
trouble with input circuits, i.e. from subject 
to first stage, due to the large possible pick-up 
of the subject’s body, contact potentials and 
variations at the electrodes on the subject, 
long leads to the grids of the amplifiers, and 
the general difficulty of finding the proper 
place to ground when such very low voltages 
are to be amplified with low input resistance. 
The balanced input arrangement used in the 
low-frequency amplifiers, with the whole 
input 9 megs up from ground, is a convenient 
way of avoiding these difficulties. The input 
network simply floats about on top of the 9 
megs with slight D.C. changes and only the 
A.C. is amplified. 
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FIGURE 2 


GENERAL SCHEMATIC DIAGRAM SHOWING THE 
INTERRELATIONSHIPS OF THE EQUIPMENT 
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FIGURE 3 
CORNER OF SHIELDED RooOM 
This photograph shows the two matched battery oper- 
ated low-frequency amplifiers and the General Radio Type 
377-B Low-Frequency Oscillator used for calibration 
purposes. 
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FIGURE 4 
FRONT VIEW OF AMPLIFIERS 

The two upper units on the left are the matched low- 
frequency amplifiers which are completely battery operated. 

The lower unit on the left is the extra stage which is 
either A.C. or D.C. operated. 

The matched power amplifiers are shown on the right. 
(Only one dynamic speaker is shown.) 
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FIGURE 5 
Back VIEW OF AMPLIFIERS 
The matched power amplifiers are on the left. 
The matched low-frequency amplifiers are on the right. 
In the lower right-hand corner the extra stage which is 
A.C. or D.C. operated is shown. This amplifier appears 


with its shield in place. All other shields are removed to 
reveal details of construction. 
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FIGURE 6 
WIRING DIAGRAM OF MAIN LOW-FREQUENCY 
AMPLIFIERS 


All tubes are especially selected type 6—C-—6. 
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FIGURE 7 
WIRING DIAGRAM OF SINGLE STAGE PUSH- 
PULL AMPLIFIERS 


The output here is suitable for operation of the cathode 
ray oscillographs and the power amplifiers. 
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Most of this research is done without the 
use of electrical filters* but for certain spe- 
cial purposes three different filters (figure 8) 
have been built. The R.C. low-pass filter 
uses 2 shunt capacitors separated by a series 
resistance. The capacitors are specially se- 
lected so that their impedance is approxi- 
mately equal to the impedance of the output. 
The series resistors are of the same order of 
magnitude as the output impedance. This 
filter can be adjusted so that it passes up to 
12 or 15 cycles without serious attenuation, 
and rapidly eliminates frequencies above this. 
The second filter is an anti-resonant 4o-cycle 
section consisting of a special inductor and a 
General Radio 219-N two dial Decade Con- 
denser. The 200-cycle, high-pass filter works 
in and out of 20,000 ohms, and is terminated 
on the output end in 40,000 ohms. It con- 
sists of a special General Radio Type 830-D 
200-cycle High-Pass filter. The use of each 
of these filters requires a change to 10 mfd. 
coupling condensers in the output network 
of the amplifiers. This and other needed 
changes are affected by a  double-throw 
double-pole switch on the front panel. 

The electrodes used in the Wisconsin lab- 
oratory are of two types. One type consists 
of a silver-silver chloride wire inserted into a 
glass T-tube filled with 10% saline solution, 
making contact with the skin by means of a 
cotton plug, and held in place by an elastic 
band. The other consists of a small silver 
spiral making contact with the skin through 

* The electrical filters developed for this research followed 


the design furnished by Mr. A. E. Thiessen, of the General 
Radio Company 
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Figures (4, 5, 9, and 11) show the power 
amplifiers used to operate the loud-speakers 
and the ink recorders. 

The amplifier system used in this program 
of research records the electrical activity of 
the nervous system by means of: (1) ink- 
writing recorders, and (2) cathode ray oscil- 
lographs. Loud-speakers may also be con- 
nected to the amplifier outputs for 
checkup. 


oral 


The power amplifiers, ink-writing record- 
ers, timing units, and the calibrating and re- 
cording oscillographs are all located outside 
of the shielded room. (figures 9, 19, and 20) 
Figure g shows the power amplifiers, loud 
speakers, ink recorders and cathode ray as- 
sembly utilized for a continuous visual check 
on the functioning of the apparatus. 

The ink recorders (figure 10) are powered 
with a single synchronous motor and a gear 
mechanism so that they are in step with each 
other, and their speed is electrically locked 
with that of the recording camera. A range 
of six speeds can be secured by means of the 
gear-shift mechanism which was specially de- 
signed and built for this unit. The sensi- 
tivity of the instruments was increased better 
than 200% by designing and making new 
tension springs for the moving elements. By 
means of a sensitive pneumatic switch the 
main recording camera can be started and 
stopped at any time by the experimenter who 
is observing the output of the ink recorders. 


FIGURE 8 


ELECTRICAL 


FILTERS 


The unit on the left contains the 200-cycle high-pass 
filter and the R.C. low-pass filter. 

The three units on the right in functional relationship 
with the amplifiers constitute the anti-resonant 40-cycle 


section. 
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FIGURE 9 
Room ADJOINING THE SHIELDED Room 


Matched power amplifiers. 

Ink-writing recorders. 

Electrical signaling device. 

Interference detector. 

General Radio Type 528 Cathode-Ray Oscillograph 
Assembly used for continuous visual check on perform- 
ance of equipment and experimental phenomena. 

Sweep circuit for cathode ray oscillograph. 

Special switching unit. 
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FIGURE 9 
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FIGURE I0 


CLoseUP OF INK-WRITING RECORDERS AND 
ACCESSORY EQUIPMENT 

Gear shift mechanism. 

Synchronous motor. 

Rheostats. 

Timing unit. 

Switch operated pneumatically from shielded room. 

Electrical signaling units. 

These recorders are locked electrically in speed with 
the camera by means of duplicate synchronous motors. 
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FIGURE II 
WIRING DIAGRAM OF POWER AMPLIFIERS 


The output here is to the ink-writing recorders. 
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The recording camera (figures 15, 16, 17, 
18, and 19) is so designed and built that the 
range of speeds of the film past the lens ex- 
tends from 1 inch per second to 30 feet per 
second. This range is secured by the use of 
interchangeable gears and motors. At all 
lower speeds it is locked electrically with the 
ink recorders thus making the records directly 
comparable. The recording camera is 
equipped with a special ground-glass focus- 


ing arrangement and uses three interchange- ‘ 


able lenses (Carl Zeis Biotar 50 mm. F 1:4, 
Taylor-Hobson Cooke Panchro Anastigmat 
108 mm. F 2:5, and Taylor-Hobson Cooke 
Kinic Anastigmat 6'4 inch F 3:5). Special 
screws afford extremely accurate focusing, 
and a side adjustment coupled with a revers- 
ing mechanism makes it possible to have as 
many as four records side by side on one 
strip of 35 mm. film. <A reversing switch 
permits the film to be run backwards without 
rewinding. <A series of special lens mounting 
rings make possible a wide variety of image 
sizes ranging from a magnification of 1/14 to 
2 times. The magazine chamber will hold up 
to 1000 feet of film. The driving mechanism 
is geared and the film speed past the lens is 
constant no matter in which direction the film 
is traveling. Thruout, '2 inch metal light 
seals are employed. The amount of film used 
and the amount remaining in the magazine 
can be accurately read from a counter which 
is geared to the driving mechanism of the 
camera. 


Much experimenting with different film 
emulsions and different photographing, sensi- 
tizing and developing procedures have re- 
sulted in a technique which makes possible 
the securing of excellent records which can be 
reproduced by means of the zinc etching 
process without retouching of any kind. 
(figures 12, 13, and 14) 


The camera is used to photograph the moy- 
ing spot on the cathode ray screen. (figures 
16, and 19) The main oscillograph is a Du 
Mont Type 158 specially adapted for use 
with a five-inch Du Mont tube with a blue 
screen and a very rapid decay period. For 
time lines a pair of Du Mont Type 164 oscil- 
lographs with 3-inch blue tubes are used. 
(One of these with the prism arrangement 
used for optical reasons is shown in figure 
19). The time line is generated by a Gen- 
eral Radio Type 377-B Low-Frequency Oscil- 
lator. For photographing simultaneous am- 
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>] 


plifications of the electrical activity of the 
cortex two specially matched cathode 
oscillographs with five-inch blue tubes are 
used. (figure 20) 


All of the recording equipment shown in 
figure 19 is kept in a light tight room and 
is operated from the outside. This insures q 
maximum of brilliancy on the cathode ray 
screen and eliminates interference from ex- 
traneous light sources. (The experimenter 
outside of the shielded room has a constant 
visual check by means of the General Radio 
Type 528 Oscillograph Assembly which is ir 
parallel with the recording oscillographs). In 
figures 16 and 19 a black slotted screen over 
the end of the cathode ray tube is shown. 
This serves to cut off all light inside the tube 
except that generated by the spot, and 
greatly increases the sharpness of the line 
traced on the photographic film. This screen 
is used on all tubes during photographic 
recording. 

Work with the electrical activity of the 
brain requires very delicate and _ sensitive 
equipment and careful recording techniques. 
Artifacts are numerous and sometimes ex- 
tremely difficult to detect. The apparatus 
used in the Wisconsin laboratory has been 
constructed of the finest materials available 
at the present time, and no pains have been 
spared to insure dependable performance. In 
addition to this care in the design and con- 
struction of the equipment, all of the appa- 
ratus, with the exception of the recording 
camera, has been constructed in duplicate 
with the result that the faithfulness of the 
operation of each unit may be compared at 
any time with that of a carefully matched 
unit. This enables the experimenter to check 
upon the reliability of any part of the equip- 
ment at any time. In the actual collection of 
data, leads from the same pair of electrodes 
are run to completely independent but 
matched amplifiers, ink recorders, and cath- 
ode ray oscillographs. If the records secured 
under these circumstances are not identical, 
the data are not used and the trouble is in- 
vestigated. Actually, in much of our research 
this duplicate record is secured thruout the 
experimental period. When the conditions of 
the experiment render a continual check im- 
possible this part of the calibration is run be- 
fore the experimentation proper, in the mid- 
dle of the experimentation, and upon com- 
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FIGURE 12 
This figure is a full size halftone reproduction of a com- 
pletely unretouched sample record secured with the equip- 
ment described in this paper. The upper and lower records 
are 50 cycle time lines. 
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FIGURE 13 
This figure is the same full size unretouched record 
given above, but in this case it has been reproduced by 
means of the zinc line etching process. 
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FIGURE 14 
This figure shows a portion of the above record enlarged 


to the size visible on the cathode ray tube before photo- 
graphing. 
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pletion of the experimentation. The bulk of cortex under varied simple and con 
the work for the past four years has involved forms of behavior, and the developmer: 
the design. construction, installation, and per- experimental and evaluative procedures. 

fection of equipment, the securing of devel- sequent papers will deal with evaluative 
opmental norms for electrical activity of the cedures and the studies of specific probl 


I. 


te 


3. 
4. 


FIGURE 15 
FULL VIEW OF CAMERA 


Camera proper. For details see subsequent photo- 
graphs. 

Special fittings for auxiliary lenses. 

Extra motor unit for altering range of speeds. 

Rear fecusing control. 

One of 4 jacks for leveling, raising, and steadying. 
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FIGURE I5 
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FIGURE 16 
CLOSEUP OF CAMERA AND QOSCILLOGRAPH 


Special viewing and aligning unit. 

Ground-glass focusing device. 

Mechanism for driving film. 

Du Mont Type 158 Oscillograph especially adapted 
for use with a 5-inch Du Mont cathode ray tube. 


Che special light shield with its narrow slot is shown 
in place in front of the cathode ray tube. 
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FIGURE 17 


CLOSEUP OF CAMERA 


1. Front focusing control. 

2. Horizontal adjustment mechanism. 
Reversing switch. 

4. Film footage counter. 
Interchangeable gears. 
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FIGURE 17 
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FIGURE 18 
CAMERA ACCESSORIES 


The front row shows five of the special lens mounting 
rings for changing the image size on the film. 
1. Prisms used in aligning cathode-ray tubes for 
photographing. 
Special test shot camera. A Taylor—Hobson Cooke 
Panchro Anastigmat 4'4 inch F 2:5 lens is shown. 
Taylor—Hobson Cooke Kinic Anastigmat 6!2 inch F 3:5 
lens. 
Carl Zeiss Biotar 50 mm. F 1:4 lens. 
Attachment employed when camera is used outside 
dark room. 


Motor for special speed range. 
One of a set of auxiliary gears for changing speed range 
of the camera. 
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FIGURE I9 


(CAMERA, OSCILLOGRAPHS AND OSCILLATOR 


Camera previously described. 

Du Mont Type 164 oscillograph used for time line. 
Prism arrangement for photographing spot on 3-inch 
cathode ray tube. 

Main recording oscillograph. (Du Mont Type 158) 
General Radio Type 377-B Low-Frequency Oscillator 


used for generating the time line. 
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FIGURE 19 
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FIGURE 20 


MATCHED CATHODE RAY OSCILLOGRAPHS 


1 and 2. Five-inch blue cathode ray tubes. 

3. Power supply. 

4 and 5. Front silvered mirrors for bringing the spot move- 
ments together on the 35 mm. film. 
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THE EFFECT OF WEIGHTS ON CERTAIN INDEX NUMBERS 


Douc.Las E. Scates and VIRGINIA FAUNTLEROY 
Bureau of Research, Cincinnati Public Schools 


Index numbers have received a substantial 
use in the field of education. On a number 
of occasions they have been employed in the 
rating of educational activities in the various 
states The pioneer, and perhaps best 
known, work in this field was done by Ayres,’ 


in 1912 and 1920. Phillips followed with 
slight modifications in 1924 and again in 
1932. Schrammel presented another index 
number in 1926, and, with Sonnenberg, later 
brought the calculations up to 1934. ‘Their 
work was revised by Scates in 1937.° The 


Research Division of the National Education 
Association forth five elements of school 
efficiency in 1932. 


Sel 


In connection with all of these index num- 
bers the problem of weighing has been pres- 
ent. Ayres left the impression that his series 


were unweighted. It is true that he did not 
apply special weights to them; yet the mathe- 
matical functions of the data which he took 
affected the relative variability of the differ- 
ent or traits, and hence affected their 
weights. Phillips, and Schrammel and Son- 
nenberg, attempted to solve the problem by 
using ranks, which make all series (traits) of 
equal weight. But equal weighting may be 
no better than—perhaps not as good as—nat- 
ural weighting (the relative variability in the 
data as they are observed). The problem of 
weighting in index numbers of this type is 
inescapable; and to resort to ranking, or other 
forms of equal weighting, is probably more 
arbitrary than to select a set of weights that 
is judged to be reasonable. 

There has, however, been a general indis- 
position to face the problem of weighting 
directly. Weighting appears to have been re- 
garded as a matter of great danger; no one of 
those who have worked in this field has shown 
any willingness to venture a set of reasonable 
weights for the problem on which he worked. 
The Research Division of the National Edu- 
cation Association,® realizing that natural 
weighting and equal weighting were not ulti- 
mate solutions for the problem, refrained from 
making any combination of the five traits 
they set forth rather than to assume the risk 
of using weights which might be in error. 


series, 
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A review of the work in this field leads 
to raise the question, How important 
weighting in this type of index number? 
it likely to be as important as the selectio: 
the original traits? Is it likely to cause mor 
error than the unreliability in the basic data 
Is it any more important than the sever 
other matters on which judgment must 
exercised in the preparation of index nu 
bers? Such questions are too broad t 
answered by any single study; they will b: 
much research. Studies of the validity 
reliability of the basic data must be far rea 
ing. It is possible, however, to make a pre- 
liminary attack on the problem by ascertain- 
ing just how much effect different weights 
likely to have, and judging whether the d 
of variation produced is unacceptable. 

The present study deals with the appli 
tion of various sets of weights to the different 
traits or factors (sometimes spoken of 
criteria) which have been used in four pul 
lished studies, three of which were ratings « 
school systems and the fourth on cost of liv- 
ing. The purpose of this study is to ascertai: 
the effects of different sets of weights on the 
resulting index numbers under normal work- 
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ing conditions. For that reason data fro: : 
published studies rather than theoreti 
data were used to experiment with. The 


question is after all a practical one more tha: 
a theoretical one. 

The studies from which data were taken 
and the technique of experimentation, will be 
made clear in connection with the description 
of each experiment. 

I. NATIONAL EDUCATION ASSOCIATION Data 
ON STATES (ACTUAL VALUES) 


The first experiment was performed on the 
five series of data which the Research Divi- 
sion of the National Education Association 
presented in 1932 to represent five different 
aspects of efficiency of state educational activ- 
ities®. These traits are as follows: 

1. The proportion of children who are in 

school. 


2. The holding power of the schools. 














f lL. 
VWarckr, 


1938] 


;. The quality of teaching provided (in- 
dexed by salaries paid teachers). 

4. The school environment (indexed by the 
value of school property per pupil). 

5. The per cent of literacy. 


More specific definitions of each trait are 
given in the original source. 

[he questions with which the present study 
is concerned are: What would have hap- 
pened if weights had been assigned to these 
traits and the traits had been combined into 
an index number? If the weights used had 
not been exactly correct, how much error 
probably would have resulted? In general, 
how much risk is involved in the assignment 
if reasonably forceful weights to such a set of 


series ? 


The Weights Used for Experimentation 

To seek an answer for these questions, an 
arbitrary set of weights was selected. The 
most obvious weights to use on five series 
would probably be 1, 2, 3, 4, 5, or some mul- 
tiples of these numbers. It was decided how- 
ever that a more forceful set should be used, 
in order to subject the effects of weighting to 
a more searching test. A set of weights hav- 
ing a maximum ratio of 11 to 1 was therefore 
selected. It was believed that this would rep- 
resent as great a ratio as would likely be used 
in the majority of practical situations. That 
is, in building up an index number for rating 
purposes, one will select and use traits which 
he regards as important; very minor, insig- 
nificant traits, will not be included. In most 
cases, traits which are judged to have a value 
of less than 1/1o or 1/11 the value of some 
other trait will probably be omitted from the 
index number. A set of weights having a 
maximum ratio of 11:1 was therefore thought 
to be satisfactory for experimental purposes. 
The intervening three weights were put at 
5, 6, and 7. 


The result was a set of five weights, two of 
which differ substantially,” and three of which 
differ only slightly. When applied, this set of 
weights would be interpreted to mean that 
three of the five traits weighted by it are 
judged to be of about equal importance, 
though differing somewhat among themselves; 
that one trait is judged to be about twice as 
important as the median one, and another 
trait about one-sixth as important as the 
median one. The results of this first experi- 
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ment should hold for any set of weights hav- 
ing roughly these characteristics. 

Each weight of course applies only to one 
series, or trait, for any given series of index 
numbers. The effect which a weight has is 
largely dependent on the series to which it 
applies.* To use the weights in any one or- 
der for the five traits would therefore afford 
only a partial test of the effects that those 
weights might have on an index number—for 
it might happen that, if the same weights were 
applied to other traits than those to which 
they were first assigned, they would show 
much greater influence on the resulting series 
of index numbers. The weights must there- 
fore be tried out in different arrangements, or 
patterns, with reference to the five traits. 

To make a complete test of the effect of a 
set of weights on a given number of series, or 
traits, would require that the weights be used 
on the series in every possible arrangement. 
That is, with the series kept in a given order, 
the weights would be applied 1, 5, 6, 7, 11; 
By. By G, Bly Fi By. So- Be: Ge SES 2... 9, 35, 8 
and so on, through all of the possible permu- 
tations. This however would result in 120 
different arrangements of the five weights. 
Most of these arrangements would differ so 
slightly from each other that it did not seem 
important to work out such a complete test. 
Instead, a small sample of these arrangements 
was used. The five weights were rotated, by 
moving them along one trait at atime. That 
is, a series of index numbers were calculated 


' with the weights in one position; then the 


weights were moved along one trait, and an- 
other series of index numbers were calculated, 
and soon. The different arrangements of the 
weights that were used on the five traits are 
shown in Table I. 

It is recognized that this rotation does not 
represent a perfect sampling of the 120 pos- 
sible arrangements; but objections were found 
to every sample considered, and the rotation 
method was adopted as yielding a fairly satis- 
factory indication of what the weights might 
cause. This set of patterns at least represents 
violent changes, for the series which receives 
the lowest weight one time receives the high- 
est weight the next time. 


Procedure 


The steps of the experiment are largely ap- 
parent from the foregoing discussion of the 
use of the weights. First, each series was 
divided through by its standard deviation in 
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order to make all of the traits of equal weight. 
A series of index numbers for the forty-eight 
states was calculated for this equal weight- 


TABLE I 


ARRANGEMENTS OF WEIGHTS USED ON Na- 
TIONAL EDUCATION ASSOCIATION DATA ON 
STATE EDUCATIONAL ACTIVITIES TO PRODUCE 
EXPERIMENTAL INDEX NUMBERS 

Designation Weights Used on the Five Series 

of Weighting 


Pattern I II Ill IV V 
Natural ~..«..- 2.4 1.8 99 22 1 
Equal (0) ---- 1 1 1 1 1 
i deacubaueame sean 1 ) 6 . 11 
err 11 1 5 6 7 
ee acaba Fak 11 1 5 6 
D is genes a 7 11 1 5 
ee 6 7 11 1 


Natural weights are those which the series 
have, as observed. The standard deviation of 
each series is divided by the smallest standard 
deviation, thus expressing the relative vari- 
ability of the five series as a set of ratios, the 
smallest being unity. 

Equal weights (designated by 0) were ob- 
tained by dividing the values in each observed 
series (naturally weighted) by the standard 
deviation of that series, thus reducing its 
variability to 1 S.D. 

All weights shown in patterns A-E were 
applied to the series after the series had been 
reduced to equal weighting. 


ing as a basis for certain comparisons. Then 
the set of five weights was applied to the five 
series, and a second series of index numbers 
was calculated for the forty-eight states. 
Then the weights were shifted one trait, and a 
third set of calculations was made. This was 
continued for each of the six special weighting 
patterns shown in Table I. 

The calculation of a series of index numbers 
involved simply the summation of the values 
for each state given by the five traits, after 
each trait had been weighted (multiplied) as 
described. The result was a series of forty- 
eight sums, each representing an index num- 
ber. While index numbers are usually ex- 
pressed as ratios—calling for the division of 
the series by some selected base—it was un- 
necessary for the present purpose to make 
such a division, since dividing the series by a 
constant does not affect its correlation with 
any criterion. The series of forty-eight sums 
was therefore used directly as a series of in- 
dex numbers. A technical discussion of the 
formula underlying this procedure is given at 
the end of this paper. 
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In order to facilitate certain comparisons. 
these series of sums (or index numbers) were 
converted into ranks. This step afforded a 
fairly significant unit for measuring displace- 
ment. The product-moment correlations 
which are recorded, however, were based 
directly on the actual values of the sums and 
not on the ranks of these sums. 


Index Number Ranks for the States 


The ranks of the states on the variously 
weighted index numbers are shown in Table 
II. So far as is known, this is the first time 
index numbers for the states have been calcu- 
lated from these data. Certainly it is the first 
time that index numbers have appeared with 
these particular weights. As_ previously 
stated, the National Education Association 
refrained from combining the five traits be- 
cause of a hesitancy in assigning specifi 
weights to them. 

The data in Table II, although reported 
here for experimental purposes, may be added 
to the literature on state index numbers, along 
with those reported previously by Ayres, Phil- 
lips, Schrammel and Sonnenberg, etc. One 
may take his choice between the seven series 
of index numbers shown in this table, accord- 
ing to his judgment as to the best set of 
weights. The series appearing under the 
heading, “Equal Weights,” is most compar- 
able to the index numbers previously reported 
by others, who have consistently used some 
form of equal weighting. 

This particular series of index numbers 
(equal weighting) correlates with those re- 
ported recently in Scates’ revision’ of the 
Schrammel and Sonnenberg numbers to the 
extent of .928. The two sets of data are for 
approximately the same date, though the 
N. E. A. data represent a three- or four-year 
earlier status than do most of the traits in the 
other index number. 


Discussion of Rank of States 


From the data given in Table II, one may 
inspect the results of the various weightings, 
and form a preliminary conclusion as to the 
effects which these have. To some, this in- 
spection may offer a more concrete and satis- 
fying basis for conclusions than any of the 
analyses which follow. 

In the first data column, ranks are given for 
an index number calculated directly from the 
five traits as they were originally given— 
without any change in the observed (natural) 








Ons 


were 


da 
ace- 
10ons 


ased 
and 


usly 
able 
time 
cu- 
first 
with 
usly 
ition 
be- 
cific 


rted 
Ided 
long 
Phil- 
One 
eries 
‘ord- 
t of 

the 
par- 
rted 
ome 


bers 
- re- 
the 
the 
2 for 
the 
year 
1 the 


may 
ings, 
» the 
5 in- 
satis- 
the 


n for 
1 the 
en— 


iral ) 





Connecticut ---- 


Massachusetts 


acer 


New Hampshire 
NUN Scie cccwemasns 
New Mexico ---- 


North Carolina —- 
North Dakota 


Pennsylvania -.---------- ares 
Rhode Island __-___-_------ ; 
South Carolina 


South Dakota 


West Virginia —- le alec 





INDEX NUMBERS 


Natural 


44 
14 
46 


9 


16 


0 


Oo Ut +) 


wHwrnynye 


— 
CO m CO Ww © Co -1° 


17 
21 


TABLE II 


Equal (0) 


44 
31 
42 
J 
18 
12 
21 


9° 
of 


47 
24 
9 


40) 


26 


A 
45 
28 
40 

1 
17 
10 
25 
37 
46 
20 
11 
16 
24 
23 
41 


19 
26 
48 
27 
39 
36 
12 
32 
42 

8 
35 
14 
13 


B 
46 
33 
43 

2 


23 


» 


19 


10 
20 


Cc 


nm ww 
Cleo et OO ON 


a“ @® 


— + & CO DD fo 
— 


i) 


—_ _ 
o~1 Ww 


on > 
ournar- 


4 


38 
5 
44 
27 

4 
32 

9 
26 
31 
48 
24 
40 
36 

2 
29 
42 

6 
34 
15 
11 


D 
45 


€ 


v 


mt 
ro wg 


17 
12 
19 
38 
47 
20 
11 
13 
24 
23 
40 
44 
27 
31 

3 

9 
22 
46 
32 


14.! 


26 
8 
21 


” 
36 
43 
33 

4 
34 
10 
16 
28 
48 
29 
39 
37 

7 
32 
41 

6 
35 


14.5 


18 


to 


72) 
wv 


vl 





RANK OF STATES ON INDEX NUMBERS DERIVED FROM Five TRAITS SUGGESTED BY THE N.FE.A., 
THESE TRAITS ARE WEIGHTED IN VARIOUS WAYS 


Rank of States on Index Numbers from Each of Seven Different 
Sets of Weights 


E 
45 
31 
44 

1 

19 


18 
35 
48 


26 


23 
24 


The code for the weighting patterns applied to the component traits is given in Table I. 
When the five component traits are ranked before weighting and combining, the results are 
those shown in Table VIII. 
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ghting. While these naturally weighted 
esults are not a part of this experiment, they 
are nevertheless a source of some interest. It 
vill be noted from Table I that the series in 
his form differ greatly among themselves in 
veight; that trait III had a_ variability 
weight) nearly one hundred times as great 
is trait V. It would therefore be expected 
that, when the traits were combined in this 
form (with natural weighting), trait III 
would dominate the resulting index numbers. 
each state would be placed in the 


+ 


ihat Is 
series of index numbers pretty much accord- 
ing to its relative position in trait III. This 
actually occurs; and, because of the extreme 
weighting, the correlation between trait III 
and the index number for all traits, with their 
natural weightings, is .o8§ 

To calculate index numbers with such an 
extreme weighting would scarcely be done, for 
there would be little use in including all of the 
five traits. One would stake his complete 

lependence upon trait III instead. Or, since 

trait IV carries a natural weight % as great 
as trait III, one might include it also. But 
little would be gained by adding in traits I, 
II, and V, which have a combined weight of 
5.2, which is only 1/23 the combined weight 
of traits III and IV. 

In spite, however, of the extreme weighting 
of traits III and IV when the natural vari- 
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ability is left intact, it is interesting to 
serve from Table II that this weighting ¢ 
index r 


number values that are rather con- 
sistently close to those in the other columns 
Arizona and Maryland are the principal ex- 
ceptions; New York, California, and others 


are examples. The correlation bet ween t 
naturally weighted and the equally weight 
index numbers runs .o16. In this parti 
case then, weights having a rati 


do not produce an effect that differs gt 
from uniform weights. 

A more detailed analysis of the differ 
shown in Table II is given later. 


ting Coefficients of Intercorre 
In Table III are shown the correlations 

tween the various columns of Table Il. h 
the left half of the table the index number 
based on equally weighted components ar 
shown correlated in turn with each of the five 
index numbers derived from experimentally 
weighted series. The two columns of corre- 
lation coefficients represent simply two differ- 
ent ways of calculating the correlations. The 
product-moment correlations represent calcu- 
lations based directly on the actual values of 


Resul 


the index numbers (without grouping int 
class intervals). The rank correlations rep- 


resent calculations based on the index num- 
bers expressed as series of ranks (as shown in 


TABLE III 


INTERCORRELATIONS AMONG 


TRAITS. 
Correlations between index numbers based on 


equally weighted traits (0) and index numbers 
based on specially weighted traits (A—E) 


Product- 
Weighting Moment Rank 
Patterns Correlation Correlation 
RL ae eeey See .995 .992 
| ea <_ .994 989 
O,E eee 990 993 
(3) fee ars 986 975 
So es .976 .952 
TR Beet 988 .980 


INDEX NUMBERS DERIVED FROM DIFFERENT WEIGHTINGS OF FIVE 
N:E.A. DATA 


Correlations between index numbers based on 
various pairings of specially weighted traits 
(A-E 





Product- 

Weighting Moment Rank 

Patterns Correlation Correlation 
BE Wesittentnitteetscent .988 987 
> 7 986 983 
eres .981 .962 
( 2 a ae .978 987 
[aa .976 .979 
ree eee Se .973 .960 
eee ee ee 972 .949 
ee he ge .963 .943 
RS aa i i Sas 949 .930 
ee .942 886 
I 971 957 


The letters preceding each correlation coefficient indicate the pattern of weights (as given 
in Table I) used to produce the two series of index numbers correlated. 


The 


product-moment correlations are based on index numbers taken at their actual values, 
and the rank correlations are based on the ranks of index numbers. 


In both cases the five 


series entering into the index numbers were taken as weighted actual values, and not as ranks. 
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The two methods of calculation 
The right half 
f Table III presents the results of similar 


[able II). 


correlations among the index numbers based 
on specially weighted components. 


The correlation coefficients in the left half 
f Table III throw light on the question, How 

much effect will weights have on index num- 
bers, as contrasted with equal weighting? In 
ther words, how much “safety” can a worker 
feel in resorting arbitrarily and mechanically 

} equal weighting, and how much danger oi 
subjective error is there in departing from 
such a routine and attempting to exercise 
some judgment in assigning weights? 

The answer to such questions is that, under 
the conditions of this experiment, it makes 
little difference which is done. There is little 
danger arising from the element of subjectiv- 
ity required for assigning special weights in 
this case; and, likewise, it may be said that 
little is gained in doing so, as compared with 
making the weighting equal. Three of the 
correlations are nearly perfect, when the con- 
ditions are most favorable; when the heavy 
weights happen to fall upon the series which 
are most unique (divergent) from the other 
traits, the correlation coeificient drops as low 
as .g5—which still is higher than the validity 
which probably would be claimed for the set 
of traits, and is probably much higher than 
the reliability of the basic data. 

The right half of Table III throws light on 
the question, If special weights are to be as- 
signed, how much danger is involved that the 
weights may not be properly placed? that is, 
that the heavy weights would be assigned to 
the wrong series, the light weights to the 
wrong series, etc.? The answer of correla- 
tion to these questions is about the same as in 
the first case; even if weights are assigned in 
the worst possible way, the correlation be- 
tween these results and the results of the best 
possible assignment of weights is reasonably 
high. Of course we cannot tell from the pres- 
ent data what is the best assignment and what 
is the worst assignment; but we may look at 
the lowest correlation in the table, and say 
that that represents the greatest difference 
possible® in the assignment of weights, which 
is the difference between the best possible as- 
signment and the worst possible assignment. 

As a matter of practical conclusion, one will 
probably concede that there is no @ priori rea- 
son to feel that one would make the worst 
possible assignment of weights. Anyone fa- 
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miliar with the field in which he is working 
would probably make a pretty good assign- 
ment of the weights. If such be granted, the 
extent by which his index number might miss 
the idea! weighting is represented by some of 
the other coefficients in Table I1]—perhaps, 


let us say, by the average—.g7. 


Effect of Intercorrelation of Series 

Before leaving tne evidence afforded by the 
correlation coefficients, we may give attention 
to the low correlation of .942 (with a corre- 
sponding rank correlation of .886) for the 
purpose of learning what conditions made it 
so low. We note that this value occurred for 
the correlation between the index numbers 
weighted by patterns B and C. In pattern B 
(according to Table I), trait II received the 
lightest weight and in pattern C trait IT re- 
ceived the heaviest weight. We infer there- 
fore that trait II is relatively unique, as com- 
pared with the other four traits, and that a 
change in weighting in the ratio of 11 to 1 for 
that trait is sufficient, when combined with 
lesser changes in the weights of other traits, 
to cause a definite disturbance in the resulting 
index number series. 

We may check this inference in several 
ways. We note, from Table III, that the cor- 
relation in which the C weighting occurs is 
the lowest in the table on the left side, and 
on the right side of the same table the four 
correlations in which the C weighting occurs 
are the lowest. Evidently the C weighting, 
with its emphasis upon trait IT and its slight 
emphasis on trait III, is the most disturbing 
(effective) of all the five patterns. 


We may however secure more definite evi- 
dence. It was previously pointed out® that 
the effect of weights was not dependent alone 
upon the value of the weights, but was condi- 
tioned also by the uniqueness of the particu- 
lar series receiving the weight. Probably the 
best measure of the uniqueness of a trait is 
its correlation with the sum of the remaining 
traits. If the weights are to be applied to 
equally weighted series to form index num- 
bers (or other composites), as in the present 
study, then these correlations for determining 
the uniqueness of each trait should be based 
on sums formed from equally weighted traits. 


The correlations between each of the five 
traits presented by the N. E. A. and the sum 
of theremaining four traits (equally 
weighted), are as follows:*° 
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Trait I and sum of other four: .886 
- | ike 3 ¥ sf ” .570 
; Iil - - .731 
ry lV oe ss “ “ “ R65 
” -_ or “g = ws = .834 
Average intercorrelation between 
all five traits: .732 


These correlation coefficients, taken to- 
gether with the weighting patterns, afford a 
final explanation of the low correlation B, C. 
Trait II is the most unique of all the five 
traits—distinctly so. When this trait receives 
the least weight (pattern B) and then receives 
the heaviest weight (pattern C), the correla- 
tion between the resulting two series of index 
numbers drops to .g94. In fact, giving the 
heaviest weighting to trait II (which pattern 
C does) seems to cause all of the correlations 
in which pattern C enters to be low. 


Variation in Leading States 


Another form of evidence concerning the 
effect of the weights is the displacement 
which they cause among the top group of 
states — say the first five. Any group of 
states would serve for illustration, but the top 
group is selected on account of the large inter- 
est that is likely to center in this group. 

Table IV shows the top five states, in order, 
under each of the weighting patterns, includ- 
ing natural weighting. Thirty out of the 
thirty-five places in the table, or 86 per cent, 
are filled by the five states which appear in 
the equal weighting column. In other words, 
we may say that the listing is 86 per cent con- 
sistent. Five places in the seven columns— 
14 per cent—are filled by “stray” states that 
are placed there by the idiosyncrasy of some 
particular weighting. Pattern C, already ob- 
served to be peculiar, accounts for two of 
these five. Outside of the natural weighting 
and the C pattern of weights, there are only 
two states in any of the five remaining lists 
that are not in all of the five lists. 
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California and New York occupy the top 
two positions consistently in all of the < 
umns—except for the freakish C weighting 
which places greatest emphasis upon a trait 
that has a great deal that is not in common 
with the remaining four traits in the index 
number. 


Average Rank Displacement 

A third type of evidence concerning th: 
effect of the weights is shown in Table V 
This table is an interpretation of the correla- 
tion coefficients of Table III. It shows, for 
each of the pairings of differently weighted 
index numbers, the differences between the 
rank positions of the states in the two index 
numbers. That is, for the two index numbers 
consisting respectively of traits weighted by 
pattern O and by pattern A, the ranks of the 
forty-eight states in these two index numbers 
differ on the average by 1.2 positions, the 
maximum difference in the ranks of any state 
being four positions. Other rows of the table 
are read in the same fashion. 


The figures in the “Average Difference’ 
column represent a directly derived form 
mean error of estimate. The values conform 
closely to those calculated by the usual form- 
ula. They may, therefore, be read in the 
usual sense; that is, knowing the rank of a 
state in the index number series based on 
equal weighting, we may estimate its rank in 
the index number series based on any of the 
five special weightings, or vice versa, with an 
average error of 1.8 ranks. This amount of 
discrepancy is a measure of the influence of 
any special pattern of weighting as contrasted 
with equal weighting. 

The right side of Table V is interpreted in 
the same way with respect to differences 
among the five special weighting patterns. 


TABLE IV 


TOP RANKING STATES ACCORDING TO INDEX NUMBERS DERIVED FROM DIFFERENT WEIGHTINGS 


OF FIVE TRAITs. 


Natural Equal 

Weights Weights (0) A 
> ™® Calif. Calif. 
2. Calif. N. Y. i Be 
3. N. d. Mass. N. J. 
4. Mass. Ohio Nev. 
5. Conn. N. dé. Mass. 


N.E.A. DATA 


B C D E 
N. Y. Calif. Calif. Calif. 
Calif. Utah N. Y. | oe A 
N. J. Nev. Mass. N. J. 
Mass. Ohio Ohio Ohio 
Mich. N. Y. N. d. Mass. 


Patterns of weights are described in Table I. 
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TABLE V 


DIFFERENCES IN RANK POSITIONS OF THE FORTY-EIGHT STATES RESULTING FROM DIFFERENT 
WEIGHTINGS OF FIVE TRAITS. N.E.A. DATA 


Averages are for the group of forty-eight states 


Equal Weights (0) and Special 
Weights (A-E) 


Pairing of Corre- Aver. Min. Max. 

Index Nos. lation Diff. Diff. Diff. 
Sa iz 0 4 
0-B aaimnccendide a 2.3 0 8 
2 3.1 0 13° 
0-D a 1.4 0 6 
OE beau a 990 1.1 0 4 
eee 988 1.8 “0 a 


Differences are in units of rank position. 


Special Weights (A-E) 


Pairing of Corre- Aver. Min. Max. 

Index Nos. lation Diff. Diff. Diff. 
US 2 981 2.8 0 12 
SS _—_———————ae 3.0 0 15 
ane ree ee .988 1.6 0 6 
I a a 978 1.5 0 7 
ee 942 4.8 0 19 
SS 2S 973 2.9 0 12 
| ae 976 23 0 8 
SS aa .963 3.4 0 13 
aS 949 3.8 0 17 
RES 986 2.0 0 6 
Re ees 971 2.8 0 11 


Differences occur between pairs of index num- 


ber values for each of the forty-eight states, when the component traits have been weighted 


as indicated. For weighting code, see Table I 


Correlations are product-moment values from Table III. 


Consistency of Ranking of Individual States 

The fourth type of evidence of the effect of 
the weights is found in Table VI. Here the 
averaging is done perpendicularly to that rep- 
resented in the preceding table. Here we are, 
in effect, going back to Table II, and sum- 
marizing the changes that occur on each line. 
We have here not the average for the series of 
forty-eight states, but the average displace- 
ment for each individual state, caused by the 
different index number weightings. This 
table may therefore be regarded as more ana- 
lytical than the preceding table, which dealt 
in generalizations for the entire group of 
states as a whole. 

It is interesting to note that some states— 
California and South Carolina, for example— 
show very little variation in rank positions. 
There are, in fact, eight states which show an 
average variation of less than one rank from 
one index number to another. This fact is 
brought about by these particular states hav- 
ing characteristics which gave them consistent 
positions on all of the five traits entering into 
the index numbers. Weighting therefore has 
little effect on their positions; even extreme 
weighting would produce little variation. 

Some states, on the other hand—notably, 
Idaho and Connecticut — show a significant 
shifting with a maximum of nearly twenty 
places, and an average of over seven places. 





And this is something that should be borne in 
mind. While the average for the table is a 
shift of 2.5 ranks, and while the average inter- 
correlation of the index numbers for the dif- 
ferent weightings is about .98, there may be 
some individual cases that diverge from the 
general tendency and exhibit marked fluctua- 
tions. The cause of these particular responses 
to various weightings is an unusual lack of 
homogeneity in the development of the state 
educational programs, as measured by the five 
traits. But such cases are always possible in 
any correlation less than unity. One cannot 
tell from the correlation coefficient alone 
whether the observed degree of correlation 
indicates a uniform tendency of all the cases, 
or whether it is an average of a heterogeneous 
group—the result of most of the cases obeying 
a close relationship, with a few cases being 
very irregular and thus lowering the coeffi- 
cient. If the former is the case, then the 
effect of different weights will be uniform; if 
the latter is the case, the effect of weighting 
will not be uniform, and particular cases may 
occur (as in Table VI) where differences in 
weights produce violent changes in the rela- 
tive standing of those cases. 


It is of course true that changes such as 
those revealed by Table VI are more likely to 
occur near the middle of the group. In social 
affairs, this region is where there is likely to 











TABLE VI 
DIFFERENCES IN RANK POSITIONS OF EACH 
STATE RESULTING FROM DIFFERENT WEIGHT- 


FIVE TRAITS. N.E.A. DATA 


INGS OF 


Average Minimum Maximum 


State Difference Difference Difference 

A lat a 7 0 2 
Arizon: 7 0 pe 
Arkansa 2.3 0 5 
California } 0 1 
Colorad 3.0 U 6 
Connecticut 1.8 0 19 
Delaware 4.7 0 10 
Florida 1.0 0 3 
Georgia L.2 0 3 
Idah 7.5 } 18 
Thi 1.9 0 4 
Indiar L.A 0 3 
lowa 2 ) 7 
Kansa 5 0 
Kentucky ee 0 
Louisiana 1.9 0 a) 
Maine 3.5 1 8 
Maryland 1.2 0 6 
Massachusetts 1.1 0 5 
Michigan 1.6 0 4 
Minnesota 3.3 0 8 
Mississippi 1.5 0 4 
Missouri 2.5 0 6 
Montana 3.0 0 8 
Nebraska _- 3.0 0 7 
Nevada 2.8 0 6 
New Hampshire 1.9 0 5 
New Jersey 4.9 0 13 
New Mexico _- 9 0 2 
New York ‘ is 0 4 
North Carolina ) 0 2 
North Dakota 2.2 0 6 
Ohio - ‘ 1.4 0 3 
Oklahoma 1.5 0 4 
Oregon : ‘ 3.1 0 & 
Pennsylvania 5.5 0 14 
Rhode Island 4.1 { 10 
South Carolina 3 0 1 
South Dakota. 1.9 0 5 
Tennessee ____ 5 0 1 
Texa pias 9 0 2 
Utah are 5.5 0 13 
Vermont . 1.5 0 4 
Virginia _- 1.4 0 3 
Washington —_- 2.5 0 5 
West Virginia 6 0 1 
Wisconsin 2.0 0 5 
Wyoming 4.0 0 9 

Total 117.1 1 288 

Mean _.-_- 2.5 .02 6 


Differences are in units of rank position. 

Basic data are given in Table II. Differ- 
ences are between ranks of index numbers 0, 
A, B, C, D, and E. “Average difference” is 
the mean of the (absolute) differences for all 
possible pairings of the six ranks: it is based 
on 15 differences for each state. 
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be least concern about exact placement 
it is worth bearing in mind, nevertheless, t! 
the correlation coefficient is simply an aver 
and the tendencies lying behind that 
age may be uniform, or heterogeneous. If t 
latter, injustice will be done in certain 
by the application of weights which 
justifiable. 

It may be interesting in passing to note 
in Table VI there are seven states 
maximum difference of ten or more, and 
other states having a maximum differen 


nine. Reference to Table II reveals 

the case of ey ery one of these states the 
mum difference occurs between weight 
patterns Bb and C. [hese patterns | 


reme weight on trait II, which sh 
correlation with the remaining traits 
ther, this analysis also reveals 


ous inspection of tables might lead one t 
pect—that the index numbers based 
terns B and C disagree with the other 
numbers in opposite directions. This 1 
seen from Table II by noting that 

one of the cases mentioned, the ra 

state for weightings B and C are on opposit 
sides of the ranks given by equal weighti: 
Hence, the results of weightings B and C s! 
a lower correlation between themselves (1 
III) than they do with the results of 
weightings or with the index numbers 

on any other special weighting. 

Conclusions from This Experiment 

The data presented in the various tables 
lead to the following conclusions: 

Index numbers based on at least five trait 
which have an average intercorrelation 
about .73 are markedly stable under the influ- 
ence of a set of constant weights 
maximum ratio of about ro:r. 

Series which are relatively unique, as show: 
by their correlation with the composite of th 
remaining series, respond much more to the 
influence of weighting than do series whic! 
correlate more highly with the remaining 
series. Extreme weights, therefore, applied 
to an individualistic series produce greater dif- 
ferences than when these same weights ar‘ 
applied to other series. 

Not only does the effect of given weights 
vary with different series, but the effect varies 
within a series, according to whether a case (a 
state, in this study) exhibits average co1 
sistency in its rating on the various trait 
originally measured. Cases which exhibit un- 
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sual heterogeneity in the characteristics re- 
flected by the different traits will show greater 
than average displacement under the influence 
f various weightings. 


Il. NATIONAL EpucATION ASSOCIATION 
DATA ON STATES (RANK VALUES) 


The National Education Association pre- 
sented their five traits in terms of ranks, as 
well as in terms of actual values. The two 
sets of data— which are of course possible 
whenever actual values are given—raised cert 

n questions. Is it better to use actual val- 
ues of the component traits for calculating in- 
lex numbers or will ranks do just as well? 
How much difference is likely to result? How 

es this difference compare with the differ- 
nce in index numbers caused by changing the 
weights of the series? Does the rank corre- 
ation technique give satisfactory values? Is 
the deep-seated prejudice of some statisticians 
against ranks supported by calculations of the 
present type? 

To obtain light on such questions the study 
reported in the preceding section was entirely 
redone, using rank values instead of actual 
values, for the component traits. It must be 
kept in mind that this work was distinctly dif- 
ferent from that reported in the first section. 
Ranks which were shown or used in Tables 
II-VI were simply the ranks of the final re- 
sults—ranks of index number values. In the 
present section, the five original (component ) 
traits were converted into ranks before being 
summed. The series of sums (index num- 
bers) are again ranked—the same as was done 
for certain purposes in the first section—but 
the essential difference between the treatment 
in the two sections lies in the form of the 
original data. 


The Weighting Patterns of the Second Study 

The weighting patterns used for this second 
study are essentially the same as those for the 
first study, shown in Table I, except that the 
weights 5, 6, 7, 11 were each one less. This 
was not a purposeful change; it came about 
through working on the project at different 
times, with slightly different notions as to 
what was desirable. 

The change in weights makes no appreci- 
able difference in the index numbers. Its 
effect may be measured by the correlation be- 
tween index numbers based on the two sets of 
weights: 1, 5, 6, 7, 11 and 1, 4, 5, 6, Io. 
In order to test the maximum effect of the 
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change, weighting pattern C was used, which 
places the maximum weight in both cases up- 
on the trait which is least in agreement with 
the remaining series. The two series of index 
numbers resulting from these two weighting 
patterns, show a rank correlation of .9997- 
there being a difference of one rank in the 
placement of five states out of the forty-eight. 
With any of the other four special weight- 
ing patterns, we should expect even greater 
agreement. 

The rotated weighting patterns for this sec- 
ond study are shown in Table VII. 

TABLE VII 
WEIGHTING PATTERNS USED ON THE RANK 
FoRM OF N.E.A. DATA TO PRODUCE 
EXPERIMENTAL INDEX NUMBERS 


Weighting Weights of the Five Traits 
V 


Pattern I Il III 
ee 1 1 1 1 
| 1 4 5 6 10 
ae 10 1 4 5 6 
re 10 1 4 5 
a ( 10 l 4 
eee 4 5 6 10 1 


Index Numbers from Ranked Components 

The six series of state index numbers result- 
ing from the six different weighting patterns 
are presented in Table VIII. These values, 
like those of Table II, were produced for ex- 
perimental purposes, but may be regarded as 
legitimate index numbers of the states. Those 
values for weighting pattern 0 (in which the 
ranked series are combined without special 
weighting) are comparable to other index 
numbers similarly constructed and previously 
published.* Some of the other series of this 
table may be superior to the equally weighted 
ones and may be employed. The choice is a 
matter of judgment. 

The correlation of the ranks in column 0 of 
Table VIII with the revised Schrammel and 
Sonnenberg index number’ (the form simi- 
larly based on ranked components) is .865. 

The purpose of preparing the data for 
Table VIII was, Lowever, not primarily to 
present another set of state index numbers, 
but to afford a comparison with the data of 
Table II, sc as to show the effect on index 
numbers of ranking component series instead 
of taking them at actual values. The com- 
parison between these two sets of resulting 
index numbers is analyzed in the following 
paragraphs. 


* See notes 1-5. 
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TABLE VIII 
RANK OF STATES ON INDEX NUMBERS DERIVED FROM VARIOUS WEIGHTINGS OF THE RANKS oF 
Five TRAITS SUGGESTED BY THE NATIONAL EDUCATION ASSOCIATION 
Rank of States on Index Numbers From Each of Six 
Weighting Patterns 

States Equal (0) A B C D E 
Risbeme 22cccues ee eae 45.5 45 45 44 46 14 
Arizona ch celia ge 29 27 31 32 24 30 
Arkansas ee ee eee ae ae 40 40 44 39 41 43 
California Sanaa sach beanie clelakabenseecard l 1 1 1 1 l 
Colorado Se eT ee LT ‘ 22 22 27 24 23 20 
Commeetiowt <2... s+-n«- a er ee 3 11 5 25 12 
Delaware RR i 23 28 21 28 21 17 
Florida ae ars ne ee 37 37 37 35 39 35 
a a a a a ac 47.5 47 46 47 47 47 
SE Seer cee ee ee mer a 24 18 29 17 22 27 
Illinois ee ee re 10 13 9 13 10 8 
Indiana a i ee ee 17 23 20 14 14 15 
BE Sattaseedatciceternnninesncan we 21 18 12 20 93 
Kansas aaeccnnemantanibaes - mn 26 26 28 21.5 26 2 
Kentucky sen i caadininenach 41.5 41 11 43 40 45 
Louisiana —— ga oe 13 44 42 46 2 40) 
Maine -. : Re ee ee 27 30 22 21.5 28 28 
a ae Siena — 34 33 32 38 31 32 
Massachusetts OP Aer ee See 3 6 3 8 3 
Michigan hae eee 5 8 4 3.5 6 4 
Minnesota Cl. 2 ee 16 14 14 18 19 1f 
EEL OE LLL 45.5 46 47 41 45 46 
a i i ee ae RAI 31 31 26 30 30 29 
a 11 9 11 9 13 14 
Nebraska SE 7, Oma pe 20 17 23 15 25 22 
I i lai 4 2 S 2 5 
NN INS oo oe sist ain ee 19 24 16 20 18 21 
New Jersey ------ See eee ae 13 15 7 23 11 7 
a ee _ 36 38 36 34 35 7 
New York Pe ee ee ee 2 3 2 5 2 2 
North Carolina ___----- Sa 14 43 43 45 44 ! 
North Dakota __.--_--- then en eae 30 25 30 26 32 31 
Ohio he SR ne ee PE niet ERT 6 12 6 6 4 3 
a 33 34 35 31 34 34 
Oregon . ae EP een vee. Sees anes 8 4 15 10 8 10 
i i nrenensenesaaiinnnts 91 20 19 27 17 19 
NS RT Cl 29 25 33 29 24 
NN SE ee _ 47.5 48 48 48 48 48 
I 25 19 24 19 27 25 
I 39 39 39 40 38 39 
. | “has SSeS 5 AR. 3 lt 38 36 38 36 37 38 
AS ROR Ge 9 10 10 3.5 9 18 
Se ema eae nee Moneta aes 32 32 33 29 33 33 
a de 41.5 42 40 42 43 42 
ET EOL LR 7 7 13 7 7 11 
ETS PPAR TE 35 35 34 37 36 36 
SL TT Ee 15 16 12 16 16 12 
| Re eae 13 5 17 11 15 13 

The code for the weighting patterns applied to the ranked component traits is given in 
Table VII. When the five component traits are taken at actual, rather than rank, values, the 
results are those shown in Table II. 
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20 


35 
47 


27 


15 
99 


“vo 


26 


40 
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The Effect of Ranking the Component Traits 


Ranking makes series have the same vari- 
ability (with negligible differences arising 
from an occasional tie), when the populations 
are constant, as in the present case. The 
series are therefore equally weighted when 
they have been converted into ranks. Rank- 
ing. however, imposes an additional element 
in that all of the values in a series are equally 
distant (one rank apart) instead of differing 
by various amounts. This fact operates to 
modify the effect of weights on the series. 
Whereas a series of actual values might have 
an extreme value—even though the series is 
no more variable than other series are, with 
respect to the average variability for a series 
—which would dominate the placing of that 
particular case in the index number, this could 
not occur in the case of ranks. 


On the other hand, differences between cer- 
tain pairs of values may be very small in the 
case of series of original values; when these 
series are ranked, no case will differ from an- 
other by a smaller amount than any other case 
will. In other words, whereas equalizing the 
standard deviations of series will equalize 
their variability (and weight) on the average, 
ranking goes further and adds to the equal 
variability the element of equal differences. 
Equal weighting, then, for ranked series, is 
not a matter of an average weighting for the 
series, but is a matter of uniformly equal 
weighting throughout the series. Whether or 
not this additional element is desired depends 
on what is wanted. In the present study we 
are concerned only with observing its effect. 


The difference produced in index numbers 
by ranked and unranked component series in 
the present case is represented by a (rank) 
correlation of .988, when the component series 
are equally weighted. That is, index num- 
bers based on equally weighted component 
series, using actual (unranked) values of the 
component series in the one case, and ranked 
values in the other, correlate to the extent of 
.988. The amount by which this coefficient 
is less than unity represents the difference in- 
troduced by the ranking of the five compo- 
nent traits before they are summed to form 
index numbers. 


The effect of ranking varies under the influ- 
ence of different weighting patterns. Corre- 
lation coefficients, similar to the one just de- 
scribed, for the various weighting patterns are 
shown in Table IX. It will be noted that the 
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correlation is the highest for weight E, and 
the lowest for weighting A. The explanation 
is largely to be found in trait V, which re- 
ceives the least weight (1) under pattern E 
and the greatest weight (11) under pattern A. 
It happens that trait V is extremely skewed, 
more than half of the cases lying in the top 
interval (Table X). This trait represents one 
of those cases in which ranked data must 
necessarily depart significantly from the dis- 
tribution of actual values — for the ranked 
data are distributed uniformly along the scale, 
and cannot bunch at one end. 


TABLE IX 


CORRELATION BETWEEN INDEX NUMBERS BASED 
ON ACTUAL VALUES (TABLE II) AND ON 
RANKS (TABLE VIII) OF COMPONENT SERIES, 
WHEN VARIOUS WEIGHTING PATTERNS ARE 
USED 

Coefficient 


Weighting (rho) 

Ble chaecaarensiaie [ Sree ORS 

SEE , .969 

B Rain tp inaeaiets .982 

| eee ae he ea .988 

Fe sacs tories viihtinekiianiinnecietives .990 

ge Te dae bad .994 

TABLE X 
DISTRIBUTION OF ACTUAL VALUES FOR 
TRAIT V 

a a 26 
BS — eS ee 5 
a a 2 
CO ae eee Sean ae 2 
I a gs ls 1 
- Eee eet ee ee ee ee 
Se ae 2 
BS A een er een a 
ee rae 1 
| Er ae eee ee eee ae 2 
SE os isi ass a dient pitino magni akan erenarenie” abi 
ON SS Seana rere 
III ci etisalat Ot lennn acti ceil 1 
CS a ae 2 
SS a er 1 
48 


While such a departure from the bunching 
of actual values occurs in most distributions 
when they are ranked, it usually occurs 
toward the middle of the range, and not at the 
end. Forced departures of the shape of rank 
distributions from the shape of the actual data 
are more disturbing when they occur at the 
ends of the distribution, because they lower 
the correlation more. In the present case the 
(product-moment) correlation between the 
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actual and the rank values of trait V by itself 
is .838—a significant drop from perfect corre- 
lation. By way of contrast, the correlation 
between the actual and ranked values of trait 
IV is .98s, which is very satisfactory. The 
distribution of actual values of trait IV is 
roughtly rectangular—the kind of distribution 
that changes little in characteristics when it is 
converted into ranks. Trait III has a corre- 
lation of .961 between its actual and ranked 
values. Correlations for traits I and II were 
not calculated; but the frequency distribu- 
tions were similar to those of traits III and IV. 


Intercorrelations Between Index Numbers 

The foregoing section dealt with the rela- 
tionship between the index numbers computed 
from two different forms of component series. 
We may now consider the interrelationship 
between the index numbers derived from the 
ranked components, to ascertain the extent to 
which the ranking has disturbed their internal 
relationship. 

The correlations between the index num- 
bers which are based on the variously 
weighted ranked traits are presented in Table 
XI. It will be seen that these values are of 
the same general order as the rank correlation 
coefficients presented in Table III. The mean 


value for the left side of the table is only .o02 
lower than the corresponding figure in Table 
III, and the mean for the right side is .007 


lower. Although there are a number of minor 
shifts in the relative positions of the coeffi- 
cients from Table IIT to Table VII, it cannot 
be said that the ranked series respond very 
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differently to the weights which were used 
from what the actual values did. The stor, 
told in the two tables is almost identical. 


Displacement in Rank Positions 

If we take the five top ranking states iy 
each of the six series of index numbers, as 
shown in Table XII, we find a general, but 
not complete agreement with the lists in Tab}; 
IV. Out of the thirty positions in the six col- 
umns, eight have dropped below fifth rank 
and have been replaced by other states 
Table XII. The agreement in content of the 
two tables is 73 per cent. The average r 
(according to Table II) of those states which 
have dropped out of the first five in Table 
XII is 4; the average rank (in Table IT) of 
the eight states which replaced them is 7.7— 
a difference of nearly four ranks on the aver- 
age. On the other hand, changes in rank 
position among the first three states of each 
column are slight. 

In lieu of presenting complete tabulations 
for this study corresponding to Tables V and 
VI, we may compare the general averages for 
such tables. Since the correlation values in 
Table XI are only slightly lower than those in 
Table III, it would be expected that the aver- 
age displacement caused by different weight- 
ings would be only slightly larger.1* In com- 
parison with average differences of 1.8 and 
2.8 for Table V, the corresponding values for 
the present study are 2.1 and 3.2. The aver- 
age minimum difference between the rank of 
the states from one index number to another 
is .o2 in Table VI, and .1t in the present 


TABLE XI 


INTERCORRELATIONS AMONG INDEX NUMBERS DERIVED FROM DIFFERENT 


WEIGHTINGS OF FIVE RANKED TRAITs. 


Correlations between index numbers based on 
equally weighted ranks of traits (0) and index 
numbers based on specially weighted ranks of 
traits (A-E) 


Weighting 
Patterns 


Rank 
Correlation 
.994 
.980 
.979 
977 
.958 


N.E.A. DATA 


Correlations between index numbers based on 
various pairings of specially weighted ranks 
of traits (A-E) 


Rank 
Correlatio: 
.979 
.978 
.972 
.963 
953 
.950 
.942 
.936 
913 
.909 


Weighting 
Patterns 
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TABLE XII 


RANKING STATES ACCORDING TO INDEX 

UMBERS DERIVED FROM DIFFERENT WEIGHT- 

s OF FrvE RANKED TRAITS. N.E.A. DATA 
Equal 
Weights 

(0) A B Cc D E 

‘al Calif. Calif. Calif. Calif. Calif. 

Nev. N.Y. Nev. N.Y. N.Y. 

N.Y. Mass. Utah Mass. Ohio 

*Ore. Mich. *Mich. Ohie *Mich. 

*Mich. *Wyo. *Conn. N.Y. *Nev. Mass. 

These states were not in this column in 

e, IV. They represent changes in the 

tates: which comprise the top five according to 

given index number, due to taking ranks 

component series rather than actual 


study; the average maximum difference for 
fable VI is 6.0, and is 6.6 for the present 
The various results in this second experi- 
ment must be thought of as applying to 
ranked components which have an average in- 
tercorrelation of .670. The average intercor- 
relation for the original (unranked) values 
was .732. 


Conclusions from Second Study 


The amount of change in relative position 
produced by ranking a series depends upon 
the shape of the frequency distribution of the 


riginal values. It is possible that no change 
in relative position at all will occur. Rec- 
tangular (flat) distributions cause least 
change in relative position when converted in- 
to ranks: violently skewed J-shaped distribu- 
affect correlations most when converted 
into ranks. Observed product-moment cor- 
relations of ranks with actual values of the 
same series ranged from .g85 down to .838. 


tions 


Expressing the original values of component 
traits in terms of ranks should not ordinarily 
cause any material change in the resulting in- 
dex numbers. Even those distributions which 
are markedly altered when ranked will not be 
significantly disturbing to an index number 
unless weighted heavily. Correlations be- 
tween index numbers based on ranked values 
and on actual values of component series 
ranged from .994 down to .969, according to 
the weighting pattern applied to the ranked 
series. When the heavy weight falls on a 
series that is highly skewed, the resulting in- 
dex number is affected more. For equal 


weights, the correlation coefficient was .988 
(Table IX). 

The average intercorrelation between index 
numbers based on different weightings of 
ranked components (Table XI) is almost 
identical with the average for index numbers 
based on actual values of component series 
(Table III). The internal relationships be- 
tween index numbers do not appear to be 
markedly disturbed by the ranking. 

The ranking of component traits does cause 
some changes in the index numbers even when 
equally weighted. These changes would not 
seem to be as large as the possible discrepan- 
cies arising from other, 
(such as lack of validity). 

Whether the changes arising from ranking 
are in a desirable direction, and represent a 
gain rather than a loss, is a matter for judg- 
ment. Without an accepted external crite- 
rion, statistics can only analyze and measure 
the amount of the change; statistics cannot 
pass upon its desirability. 


unknown causes 


Ill. ScHRAMMEL AND SONNENBERG’S DATA 


It was desired to carry on the study of the 
effect of constant weights on index numbers 
with a larger number of series. The data pub- 
lished by Schrammel and Sonnenberg,’* giv- 
ing an index number of states based on eleven 
traits, were employed in a third study which 
follows in general outline the first two. 

The pattern of weights had to be adjusted 
somewhat to cover the eleven traits, but was 
kept as nearly like the patterns previously 
used as possible. The weighting patterns for 
this third study are shown in Table XIII. 
In the present experiment, the weights were 
rotated the same as in the first two studies, 
but were moved two series at a time (three 
series in one case) so as to produce five dif- 
ferent special weightings, as in the first two 
experiments. 

This and the remaining studies will be pre- 
sented briefly, since the general outlines of 
the attack and the incidental analyses are now 
obvious. The rank of the states according to 
the resulting index numbers will not be pre- 
sented but analyses of the table will be given. 
The equally weighted index number has of 
course already been published. 


The Results of Varying the Weights 


Intercorrelations among the index numbers 
resulting from the changes in weighting pat- 
terns are shown in Table XIV. It will be 








JOURNAL OF EXPERIMENTAL EDU( 


TABLE 


{TION [Vol. 6, N 


XIII 


WEIGHTING PATTERNS APPLIED TO THE ELEVEN TRAITS OF SCHRAMMEL AND SONNENBER 


Pattern Ill 
wae 
A 


2 
> 


1 


5 


1 


1 
1 
7 
6 


TABLE 


IV 


wh Wit 


1 
6 


VIII Ix 


1 
7 
6 
6 


1 
11 


XIV 


INTERCORRELATIONS AMONG INDEX NUMBERS DERIVED FROM DIFFERENT WEIGHTINGS OF ELEVEN 


RANKED TRAITS. 


Correlations between index numbers based on 
equally weighted ranks of traits (0) and index 
numbers based on specially weighted ranks of 
traits (A-E) 


Rank 
Correlation 
.958 
.949 
.940 
.935 
.904 


937 


Weighting 

Patterns 
0,E 
OS, « 
ee . 
0O,A 
0,D 


Mean 


TABLE 


SCHRAMMEL AND SONNENBERG DATA 


Correlations between index numbers based or 
various pairings of specially weighted ranks 
of traits (A-E) 


Rank 
Correlation 

916 

913 


910 


Weighting 
Patterns 


. 
4 


— 
4 4WwW 


D 
A 
B 
A 
B 
A 
C 
C 
B, 
A 


XV 


DIFFERENCES IN RANK POSITIONS OF THE FORTY-EIGHT STATES RESULTING FROM DIFFEREN 


WEIGHTINGS OF ELEVEN TRAITS. 


SCHRAMMEL AND SONNENBERG DATA 


Averages are for the group of forty-eight states 





Equal Weights (0) and Special 
Weights (A-E) 


Min. Max. 

Diff. Diff. 
12 
13 
15 
16 
10 


Aver. 
Diff. 
3.60 
8.42 
3.17 
4.38 
3.23 


3.56 


Corre- 
lation 


Pairing of 
Index Nos. 
0-A _.-- 
0-B 
0-—C 
0-D _- 
0-E 


Mean 13 


noticed that these are much lower than those 
observed in the first and second studies. The 
average correlation on the left side of Table 
XIV is .o5 lower than in Table IIT, and the 
average for the right side is .11 lower. In 


Special Weights (A-E) 


Min. 
Diff. 


Max. 

Diff. 
18 
20.5 
28 
16 
15 


92 


18 
19 
19 
15 


19 


Aver. 
Diff. 
4.08 
4.42 
7.06 
5.48 
4.17 
6.60 
5.38 
6.02 
5.77 
4.60 


Corre- 
lation 


Pairing of 
Index Nos. 


o | coococorooooo 


5.36 


fact, only one value in Table XIV exceeds 
the lowest value reported in Table ITI. 

The amount of variation between the place- 
ment of the states by the differently weighted 
index numbers is shown in Table XV. The 





on 
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TABLE XVI 


DIFFERENCES IN RANK POSITIONS OF EACH 
STATE RESULTING FROM DIFFERENT WEIGHT- 
INGS OF ELEVEN TRAITS. SCHRAMMEL AND 
SONNENBERG DATA 


Average Minimum Maximum 


State Difference Difference Difference 
Alabama ----- 3.1 0 7 
a 5.5 5 13 
A kansas ---- 2.9 0 >. = 
California ---- 5.4 1 11 
Colorado .---- 9.1 0 18 
Connecticut —-- 6.8 1 16 
Delaware ---- 10.0 1 23 
Florida -..--- 1.5 0 3 
Gorgas =<.<-- 3.3 1 7.5 
ee 5.5 0 12 
] ae 6.9 0 16.5 
Indiana ------ 5.7 5 15 
re ee 6.4 1 14.5 
OS 8.2 2 19 
Kentucky ---- 1.9 0 5.5 
Louisiana ---- 3.2 0 7 
BOIS. .onwcue 5.9 0 14 
Maryland ---- 5.7 0 14 
Massachusetts_ 3.5 0 g& 
Michigan ---- 4.6 1 10.5 
Minnesota __-- 1.6 0 4 
Mississippi --- 3.3 0 8 
Missouri ..--- 2.5 0 6 
Montana -.--- 1.6 0 4 
Nebraska -.-- 11.1 1 28 
Nevada ..---- 2.4 5 5 
New Hampshire 6.6 1 15 
New Jersey --- 10.0 0 20 
New Mexico —- 6.6 0 13 
New York ---- 8.9 0 20 
North Carolina 1.8 0 4 
North Dakota- 8.2 0 18 
ra 7.5 2 16.5 
Oklahoma __.- 3.9 0 8 
Oregon ____ - 7.6 1 18 
Pennsylvania... 5.3 1 14 
Rhode Island__ 3.3 1 7 
South Carolina 1.0 0 3 
South Dakota- 8.2 1 16 
Tennessee ____ 1.8 0 4 
i. oe 3.1 0 6 
lo eee 1.2 0 8 
Vermont ___-- 5.5 1 13 
Virginia ..__- 2.0 0 5 
Washington _-_ 2.0 0 4 
West Virginia. 1.5 0 3 
Wisconsin -_-_ 6.1 5 11 
Wyoming ___-_ 9 0 2 

Mean __-. 4.78 4 10.8 


“Average Difference” is the mean of the 
(absolute) differences for all possible pairings 
of the six series of index numbers based on the 
six weighting patterns shown in Table XIII. 
The Average Difference for each state is based 
on 15 differences. 
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mean maximum difference for the left side is 
13, as compared with 7 in Table V; and the 
difference for the right side is 19, as compared 
with 11 in the earlier table. 

Table XVI gives the amount of variation in 
rank from one index number to another, for 
each individual state. It will be seen that 
the values for the table are larger than those 
for Table VI. The two sets of figures are 
as follows: 


Table VI Table XVI 


Mean Average Difference__ 2.5 4.8 
Mean Minimum Difference. 0.02 0.38 
Mean Maximum Difference 6 11 


The average difference and the maximum dif- 
ference are roughly twice as great as for the 
earlier data. 


Analysis of the Results 


Inspection of Table XIV reveals that the 
index number having weighting D correlates 
lowest with any of the other index numbers; 
it is at the bottom on the left side and occu- 
pies the three bottom positions on the right 
side. Interestingly enough, it also occupies 
the highest position on the right side. The 
only inference is that weighting E deviates 
from the rest of the index numbers in the 
direction of D sufficiently to give the highest 
correlation in the table. 

The crux of the results found for the 
Schrammel and Sonnenberg data lies in the 
extremely low intercorrelations between the 
various traits which were employed. To use 
a single state as a suggestion of what happens, 
New York has the following ranks in the 
eleven traits: 1, 2, 2, 4, 9, 15, 18, 27, 38, 
44,45. A number of other states share simi- 
lar vicissitudes. The picture for the entire 
set of forty-eight states is given by the fol- 
lowing correlations: 





Correlations between the 
sum of traits 


and the sum of the 
remaining traits 


* & . eee 111 

. * ae 427 

 %, =e . .378 

(CO ae 549 

, Eevee .208 
Average intercorrelation be- 

tween the eleven traits_-_ 132 


When relationships between component 
traits are this low, the effect of weights will 
be marked, and precise analysis can not be 
made. That is, the correlations for the dif- 
ferent weighting patterns will not follow 
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closely the magnitude of the intercorrelations 
of the original series because, when correlation 
values are low, there are too many different 
possible structures, so that roughly, almost 
anything may happen. It may be wondered 
that correlation coefficients averaging around 
.90 (Table XIV) would be found for the dif- 
ferent weightings of such unrelated data. 

It might be proper to raise a question, in 
passing, as to the desirability of including 
traits which have a very low correlation with 
the other traits, in an index number for rat- 
ing purposes. It is true that the philosophy 
of factor analysis, in psychology, is to find 
traits which do not have anything in common 
with each other; but it is also recognized that 
these will, if and when found, be fundamental 
traits and not “surface effects.” They will 
not be phases or aspects or parts that can 
readily be observed; they will not be such 
things as one can get quantitative data on for 
including in an index number of this sort. It 
would seem within reason to expect traits rep- 
resenting different observable phases of “good- 
ness’’ of a complex phenomenon, such as state 
educational activity, to correlate pretty well 
with other phases—taking all of the states as 
a whole. Perhaps a minimum correlation of 
.50 or .65 would be a suitable criterion. 
When one includes traits that bear little rela- 


tionship to the rest of his traits, it would seem 


@ priori reason to give such traits a 
low weighting. If all of the traits individu- 
ally show a low relationship with one another, 
or with the rest of the traits as a group, then 
one should question whether he has selected 
traits that bear upon a homogeneous concept. 


there was 


IV. CHAMBERLAIN’S DATA 


A fourth study was made on a still larger 
number of traits, and a larger population. 
Chamberlain’* published data on the educa- 
tional activities of 120 counties in Kentucky. 
He used 15 traits, weighted them equally, and 
combined them, giving the median of the 
sigma values for each county. He also gave 
rank values for each trait, which were used 
in the present study. Because data were in- 
complete for twelve of the counties, these 
cases were dropped, leaving a population 
of 108. 

The weighting patterns are shown in Table 
XVII. They were moved three columns 
(traits) at a time, resulting in five special 
weightings in addition to the equal weight- 
ings, as in the previous studies. When this 
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was done, the series of resulting index num. 
bers yielded correlation values as shown 
Table XVIII. These values are practical), 
as high as those given in Table III, being o; 
the average only .o12 lower for the left hal; 
of the table, and .oog lower for the right side 

An analysis of differences was made, simi- 
lar to those given earlier in Tables V and XV 
In this case, the frequency distribution was 
made for all of the five or ten pairings 
index numbers together, rather than giving 
the average difference separately for each 
pairing. It will be seen that there are a few 
differences that are very large; on the other 
hand, 60 per cent of those in the left column 
and 42 per cent of those in the right column 
are differences smaller than six. In consider- 
ing differences in this table, it must be borne 
in mind that there are 108 cases instead of 
48. Each rank position is therefore a smaller 
portion of the range. Reducing the average 
difference to comparable terms with those 
the preceding tables, the values become 
and 4.1 ranks, respectively (based on a pop- 
ulation of 48). These values are approxi- 
mately 50 per cent higher than those 
Table V. 

The average intercorrelation of the origina! 
15 series of ranked data is .524. 


V. TEACHERS Cost-oF-LIVING INDEX 


It was desired to include in this genera 
analysis an index number of the econom 
type, in addition to the four index number 
for rating purposes. For this fifth stud) 
data were taken from a National Educatio: 
Association research bulletin’* on the cost 
living for teachers. The principal questior 
is, How much will this index number var 
under the influence of weighting patterns such 
as those used in the preceding experiments’? 

The question is in part answered by data 
contained in the Research Bulletin’® which re- 
views six other index numbers differing chietly 
in the weights which are used. Five of these 
are graphed, showing the variations which re- 
sult from differences in weighting. In the 
year of largest disparity (1932-33) the maxi- 
mum difference was 10.6 per cent, the mini- 
mum difference was 0.5 per cent, and the 
mean difference was 5.2 per cent.’® The 
weighting patterns used in these different in- 
dex numbers are somewhat difficult to pre- 
sent, because the categories varied. Perhaps 
the present experiment will provide a suffi- 
cient number of comparisons, without an at- 
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TABLE XVII 


WEIGHTING PATTERNS APPLIED TO THE FIFTEEN TRAITS OF CHAMBERLAIN’S STUDY 


I II II! IV V Va VEl VER IX xX XI XII XIII XIV 
1 1 1 1 l 1 l 1 1 1 
1 1 1 4 4 4 5 § f j } 10 
10 10 10 1 1 1 4 4 
1 


10 
b 6 
6 6 6 1 5 5 
5 5 5 6 10 10 4 { 
4 4 4 5 5 6 6 1 l 


, TABLE XVIII 


/RRELATIONS AMONG INDEX NUMBERS DERIVED FROM DIFFERENT WEIGHTINGS OF FIFTEEN 
RANKED TRAITS. CHAMBERLAIN’S DATA 


relations between index numbers based on Correlations between the index numbers bas 
weighted ranks of traits (0) and index on various pairings of specially weighted ran 
ers based on specially weighted ranks of of traits (A-E) 
(A-E) 


i 


e 
1 
KS 


Rank Weighting 
Correlation Patterns 
.983 C-—E 
971 A-C 
.966 B-E .- 
.966 A-D 
.952 B-C 
A-E 
C-D 
D-E 
A-B 
B-D 


Mean 


TABLE XIX 


ENCES IN RANK POSITIONS OF THE 108 COUNTIES RESULTING FROM DIFFERENT 
WEIGHTINGS OF 15 TRAITS. CHAMBERLAIN’S DATA 


neces between index numbers based on Differences between index numbers based on 
ngs 0,A; 0,B; 0,C; 0,D; and 0,E weightings A,B; A,C; A,D; A,E; B,C; B,D; 
B.E; C.D; CE; D,E 
Frequency Class Interval Frequency 
51-—53.5 — l 
4850.5 ___- 2 : ] 
45-47.5 
42-44. 
39—41.5 


on 60 & 
36-38. 


oi 


33—35.! 
30-32. 
27-29. 
24-26.! 
21-23. 
18--20.! 


NWwonmo rds wre 
; TAMAS 


mI DS SW Ww DO 


NN 


Number of Differences Number of Differences —_- 
Mean Difference 0% Mean Difference 
Median Difference .______- Median Difference 


Comparable mean difference for Comparable mean difference for 
48 ranks z. 48 ranks 
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tempt at analyzing the weightings of the other 
index numbers published, which would neces- 
sarily be unsatisfactory because of too many 
basic differences. 

The weighting patterns used in the present 
experiment are shown in Table XX. The 
weights used by the N.E.A. in the index num- 
ber computed by the Research Division are 
also shown since comparisons are made with 
that index number. Since the N.E.A. weights 
were applied to the series as they stood—each 
series with its natural weighting, the weights 
are not directly comparable to those which 
have been used throughout the experiments in 
the present study, which were all applied to 
series that were equally weighted to begin 
with. These N.E.A. weights are therefore 
shown in two forms in Table XX—first the 
nominal form, and second the effective form. 
The latter represents a combination of the 
nominal weights and the natural weights of 
the various series, thus giving the weights 
that would have been applied to equally 
weighted series to produce the same effect.’ 
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All of the weighting patterns shown have been 
divided through by the smallest weight 
(greater than zero) so as to reduce the small- 
est weight to unity and thus facilitate inspec- 
tional estimates of the relative force of differ- 
ent sets of weights, insofar as this force may 
be indicated by the ratios of each set. 

The results of the various weightings ar: 
shown in Table XXI. It will be noted from 
an inspection of the table that weighting A 
gives the highest results, B the next highest, 
and H the lowest (especially for the last 
year). Weighting O, as usual, gives ‘middle 
of the road” results. Weights A and B em- 
phasize trait VIII, which, even though the sig- 
mas have been equalized, is a “slow moving 
trait, being 100 or above for three years of 
the six years. Weighting H, on the other 
hand, gives least weight to trait VIII, and 
produces the lowest values. Weights of th 
other traits, of course, also enter in to effect 
the results observed. 

It is interesting to observe that, with a sin- 
gle exception in the table, all of the index 


TABLE XX 


WEIGHTING PATTERNS APPLIED TO THE EIGHT ITEMS (TRAITS) 


ENTERING INTO THE 


CostT-0F-LIVING INDEX NUMBER 


Cloth- 

Weighting Food ing Rent 

Pattern I Il III 
Natural* 3 3.4 
N.E.A. Nominal** 10.1 6.0 
11.1 4.3 
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* These are the ratios of the standard deviations of the various traits to the smallest 
standard deviation greater than zero. 

** These are the (ratios of the) weights applied by the N.E.A. to the series as they stood, 
each series having its natural weighting. These weights are those reported by the N.E.A., 
divided through by the smallest weight so as to make the smallest value unity. 

= These weights represent the combined effect of the natural and nominal weights; they 
are the ones which should be taken as the real weights affecting the traits entering into th: 


N.E.A. index number. 


They are obtained by multiplying the natural weights by the nominal 


weights, and dividing through by the smallest product greater than zero, so as to reduce the 


smallest weight to unity. 


$2 The observed series for trait VII had zero variability; that is, al] of the values in th 


series were the same. 
weights applied to it. 
ber. 
dropped from the table. 


Its effective weight therefore becomes zero, regardless of any nominal 
' It has no effect in the positioning of any case (year) in the index num- 
It was omitted in the experimental work, and the entire column of weights could be 

Since in pattern G the base weight (1) becomes ineffective, a row G’ 


is given, showing ratios to the smallest effective weight in that pattern. 
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TABLE XXI 


EXPERIMENTAL COST-OF-LIVING INDEX NUMBERS DERIVED FROM VARIOUS WEIGHTINGS 
OF EIGHT TRAITS 


Year N.E.A.* 0 A 


OE 2 ee 100.0 100.0 100.0 
SS ae 98.6 99.0 99.4 
OE, : ae ee 91.7 94.7 96.4 
SE ee eae 83.5 89.3 92.6 
TE a eee 77.3 84.6 88.6 
2 ikidbicedais 80.6 87.3 90.2 


B Cc D E F G H 


100.0 100.0 100.0 100.0 100.0 100.0 100.0 


99.3 99.2 99.0 99.0 99.1 98.6 98.4 
95.2 945 94.5 95.2 94.7 93.3 92.8 
90.4 88.8 88.6 90.1 88.4 87.6 86.5 
86.3 84.5 83.4 84.7 82.7 83.1 82.1 
88.5 88.1 85.4 86.6 85.8 86.8 86.0 


*As given by the Research Division, p. 236, of the reference cited in footnote 14. 


numbers for all of the years are higher than 
the N.E.A. index numbers. This fact brings 
into relief one of the deficiencies of the sam- 
pling provided by the weighting patterns 
used. It is possible so to arrange the weights 
in any one of the patterns A—-H, that an index 
number value lower than the N.E.A. values 


the start that the patterns should have been 
reversed and rotated; but facilities were not 
available for following up all of the possibili- 
ties which presented themselves. It was 
thought best, within the scope of the work set, 
to follow the one pattern systematically. 
The data in Table XXI are analyzed in 


Tables XXII and XXIII. The average dif- 


ference, over the six years, for any two pairs 


will result. For example, the pattern: 11, 7, 
9, 1, 6, 5, 6,3 does this. It was realized from 


TABLE XXII 


DIFFERENCES IN THE VALUES OF CosT-OF-LIVING INDEX NUMBERS RESULTING FROM DIFFERENT 
WEIGHTINGS OF EIGHT TRAITS 


Averages are for the six years 


Equal Weights (0) and Special 
Weights (A-H) 





Special Weights (A-H) 








Pairing of Aver. Min. Max. Pairing of Aver. Min. Max. 
Index Nos. Diff. Diff. Diff. Index Nos. Diff. Diff. Diff. 
SS ee wos Ee 0 3.3 a ee .. 1.58 0 2.3 
Se .80 0 1.7 A-C _ ede a said sn 2 0 4.1 
RL 0 8 A-D _ 7 2.71 0 5.2 
(eee ee » 7 0 1.9 A-E - ~~ a 1.93 0 3.9 
2 See <== wae 0 8 rn : 2.75 0 5.9 
a ee 0 1.9 ae 2.97 0 5.5 
ae ae ae | 0 hy A-H .... = 3.57 0 6.5 
_ ra ae a= aa 0 2.8 B-C -- ; . 0 1.8 
— ——— B-D aoe 1.47 0 3.1 

Mean .- ee ee 0 1.86 a= _ 65 0 1.9 
B-F a 0 3.6 

B-G .- Lea =, Lore 0 3.2 

B-H ales 2.31 0 4.2 

- ea 0 2.7 

i a = .65 0 1.5 

CF .... mame wae 0 2.3 

aaa 0 1.4 

C-H --. ea 1.55 0 2.4 

D-E --. ss Nasas alt .62 0 1.5 

- ae - 27 0 a 

ge asi 0 1.4 

D-H -- seeiaaieniban richie 1.5 0 2.1 

E-F ss odie 85 0 2.0 

Be ccnaen os ae 0 2.5 

* apt 1.83 0 3.6 

F-G .. scikeonekin dt a 0 1.4 

F-H  .... wien 95 0 1.9 

G-H -- - 0 1.1 
Ee eee oe 1.42 0 2.85 


Differences are in units of per cent, the base year being 100%. 
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of columns in Table XXI, is given in Table 
XXII, together with the minimum and maxi- 
mum difference. Differences in which weight- 
ing patterns A, B and H are concerned are the 
large ones; apart from these, there is only one 
average difference greater than 1.00, and only 
four maximum differences greater than 2.00. 

In Table XXIII are given the averages in 
the other direction—across the different index 
numbers, in all possible pairings for each year. 
Each “average difference” in this table is the 
average of 36 differences, between the col- 
umns of Table XXI, for a single row. Look- 
ing at it another way, each average difference 
in Table XXIII represents the average of the 
36 pairings shown in Table XXII, but broken 
down for only a single year, at a time. When 
proper weighting is applied, and errors ac- 
cumulating from the rounding of decimals are 
allowed for, the average for the whole table 
should equal the combined averages of Table 
XXII. 

TABLE XXIII 
DIFFERENCES IN COST-OF-LIVING INDEX 
NUMBERS FOR EACH YEAR, RESULTING 
FROM DIFFERENT WEIGHTINGS 


Each average difference is for the 36 different 
pairings of index numbers for that year 


Min. Max. 
Diff. Diff. 


Year 
1928-29 
1929-30 - ~« # 
1930-31 a a 0 
1981-82 .... 5 i : 
1932-3: ee 4 ‘ 6.5 

5.8 


7 . q 1 46 


Differences are in units of per cent, the base 


Mean 


year (1928-29) being 100 per cent. Base year 
in which the differences were zero is not in- 
cluded in the averages. 


The large maximum differences in Table 
XXIII are due to the three weighting pat- 
terns already noted as causing extreme val- 
ues. Aside from these three patterns the 
maximum differences for the five years are as 
follows: 0.8, 2.4, 3.6, 2.6, and 2.7; and 
average differences are much less. In other 
words, when weights are applied which do not 
emphasize the idiosyncrasy of some unusual 
trait, differences in the resulting index num- 
ber values are not likely to exceed the limits 
of reasonable fluctuation. Cost-of-living in- 
dex numbers are not expected to be accurate 
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within three or four per cent when applie: 
over a large area, or to different in 
groups; they probably are not accurate y 

in that range even for a specific locality and 
particular expenditure level, on account of 
ordinary problems of sampling. 

One must of course note that there is a dif- 
ference of 11.3 between the N.E.A. values { 
1932-33 and the index number for weighti 
pattern A, as shown in Table XXI; and, ¢ 
excluding patterns A and B there is a difi 
ence of 7.5 for 1933-34 between the result 
of the N.E.A. weighting and weighting | 
Such differences are larger than one w 
care to have in his index numbers. 
must be remembered that the weighting pa 
terns in Table XX are assigned mechanically 
and do not represent any intelligence. Ony 
has merely to consider: Is it likely that any- 
one would, by the exercise of judgment, as- 
sign only 2 per cent (1/48) of the total ex- 
penditure to rent, as is done in pattern C? 
(Rent is the only trait which makes a signifi- 
cant drop in 1933-34; all of the patterns 
which accord it little weight are therei 
high for this year.) The differences whic! 
are shown in Tables XXI-XXIII are not t 
be taken as indicating the amount of error 
likely to arise by virtue of reasonable weight- 
ing, but rather as the amount of difference 
that might arise when weighting is unreas 
able: to wit, pattern A with its 2 per cent 
allowed for food. 

It may be pointed out in passing that t! 
method used by the N.E.A. in handling tr 
VII, which has no variability, while it is t 
orthodox method in economic index number: 
operates in the direction of lowering the vari- 
ability of the resulting index number. That 
is, including a constant series tends to keey 
the index number slightly nearer the base year 
(or, more exactly, nearer the constant value 
of the trait in question). If, on the oth 
hand, the trait is omitted from the summa- 
tions, on the basis that its variability is zer 
and therefore its effectiveness in the position- 
ing of any case (year) is nil, the index num- 
ber derived from a given set of weights will be 
somewhat livelier. Logic may, however, re- 
quire the inclusion of such series, as it prob- 
ably does in the present instance. 

Table XXI is not analyzed by the correla- 
tion technique, as in the preceding four ex- 
periments, because there are only six cases 
for each correlation, and the results would not 
be regarded as significant. The correlations 
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ire obviously high, and if rank correlations 
were calculated, most of them would be per- 
fect. Another difficulty in the interpretation 
f correlation values for such an index num- 
ber is that the range of observations is of 
large moment in determining the value that 
; found. The range of some twenty points 
may or may not be regarded as a satisfactory 
variation in cost-of-living index numbers for 
the purpose of calculating correlations. , This 
problem does not arise in connection with the 
full set of states, or counties of a state, where 
the variability is complete—at least for any 
civen time represented by the index number. 

It is of interest for the present experiment, 
however, to report that the average intercor- 
relation between the basic data is .821. This 
value is given, not as necessarily typical, but 
imply as a necessary factor in the considera- 
of the results in Table X XI. 


rn 
il 


Tue Type or INDEX NUMBER CALCULATED 
\n index number is essentially a sum, or 
erage, of the weighted variables. Either 

the original variables, or the average, is us- 

ially expressed as a per cent of the values for 
some base year or place. There are many 
lifferent formulas which can be used for cal- 
ulating an index number, and the values ob- 


tained depend somewhat on the particular 


formula used. After an elaborate analysis 
various formulas, Fisher’* selected eight 
which he regarded as the best. 

Six of these eight ‘‘best’’ formulas become 
identical when constant weights are used,’” 
and therefore provide a natural form to use 
for the present work. The common form 
taken by these formulas is: 


, 


or, more simply, but less specifically: 
TwX, 


wk, 


lhe numerator of this formula indicates that 
the various traits (X) are to be weighted (w) 
ind summed for any given state (or year, 
when time is the independent variable); this 
numerator is then expressed as a ratio of a 
similar summation representing values for a 
base State or year. 
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The weights are constant from state to 
state (that is, constant over a column), but 
vary from trait to trait (that is, w and X are 
both variables in any single index number 
value). 

Base values are of course constant, and it 
was not necessary to divide through by base 
values for the first four experiments here re- 
ported, since such a division would not affect 
the correlations or the ranking. If it had 
been necessary, probably the index number 
based on average values for the 48 states 
would have been used as a base. In the fifth 
study, the values for 1928-29 were used as 
the base. 

In economic index numbers the weights or- 
dinarily vary from year to year. It does not 
seem appropriate to vary them in index num- 
bers for rating purposes; whether they should 
be varied for a cost-of-living index number 
may be debatable. 


FURTHER STUDIES OF WEIGHTING NEEDED 


The present investigation has been limited 
in scope, with the intention of presenting the 
effects of various weights under normal oper- 
ating conditions. It has not run down en- 
tirely and systematically any of the numer- 
ous avenues which have opened up as the work 
progressed. The following topics are sug- 
gested as fruitful for statistical experimen- 
tation: 


1. Explore more fully the possible arrange- 
ments of a given set of weights. Only a small 
sampling was used in the present experiments. 

2. Study the effects of more forceful sets of 
weights. 

3. Compare the variations produced by 
weighting, and those produced by different 
index number formulas. (Would probably 
involve variable weights.) 

4. Compare the variations in any series of 
index numbers published for the states, pro- 
duced by different reasonable weights as- 
signed to the traits by different judges, with 
variations between the index numbers pre- 
pared by different workers. (A comparison 
of the effects of reasonable weights with the 
validity of the traits selected to compose an 
index number. ) 

5. How should the force of a set of weights 
be measured? Various methods are possible; 
which method most closely indicates the abil- 
ity of a given weighted series to determine the 
position of a case in the final composite? 
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6. What is the reasonable limit of relative 
weight that can be assigned to a series having 
a given degree of uniqueness without fear of 
producing a change in the resulting com- 
posite or index number, that might be seri- 
ously misleading? (Assumes one is inter- 
ested in determining limits within which 
weighting may be done without fear of caus- 
ing significant error because of poor judg- 
ment. Within such limits many workers 
would be willing to weight, trusting that what- 
ever change was produced would be a ten- 
dency in the right direction.) 


SUMMARY 


Sets of weights, having a maximum ratio of 
10 or 11 to 1 have been applied to four differ- 
ent sets of data, and the resulting differences 
analyzed. The first set of data was used in 
two forms: — actual values and rank values. 
The first three experiments dealt with index 
numbers for educational activities of the 48 
states: the fourth experiment was concerned 
with education in 120 counties of one state, 
and the fifth experiment was made with a 
cost-of-living index number. The results, and 
the principal analyses, are presented in a 
series of tables as follows: 


Weighting Patterns ____- a ee 
Index Numbers _____-_-- TES Rae aed ee 
Intercorrelations of Index Numbers 
Average Differences, for Group ___- 
Differences for Individual Case 
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usually believed. It is necessary, however, to 
bear in mind the factors which determine the 
importance of weighting. 

The effectiveness of weighting appears to 
depend upon the following factors: (1) the 
force of the set of weights; (2) the unique- 
ness (lack of correlation) of a particular 
series; (3) the shape of a series (as compared 
with the shapes of the other series, when 
plotted on some common scale); or the line- 
arity and general homogeneity in intercorre- 
lations; (4) the number of series entering in- 
to the composite, or index number; (+; 
whether the weights are constant, or variable 

Weights are not to be thought of as effec- 
tive solely at their face value. The charac- 
ter of the series to which weights are assigned 
appears to have as much to do with the effect 
of the weights as the relative force of the 
weights themselves. 

When the heavy weights are attached to 
series which are relatively unique, a signiti- 
cant difference is almost certain to occur in 
the placement of cases in the composite. 

The general effect of weights on a set 
series is probably in proportion to some 
verse function of the average intercorrelatior 
of the series. 


The conditions and results may be briefly presented as follows: 


Population: Number of Cases 
Number of Traits __- ee 
Average Intercorrelation of Traits ___ 


Average Intercorrelation of Index Numbers_-_- 
Lowest Intercorrelation of Index Numbers ____ 
Average Difference in Placement of a Case ___- 


Largest Difference in Placement of a Case __ 


CONCLUSIONS 


An unrestricted generalization that the 
weighting of index numbers is important, or 
unimportant, is not justifiable. Weighting 
may, under certain circumstances be of the 
greatest importance. On the other hand, un- 
der ordinary conditions, moderate weighting 
probably is not of as great importance as is 





Exp.1 Exp. 2 Exp. 3 Exp. 4 Exp. 5 
Schrammel 
N.E.A. N.E.A. andSon- Chamber- Cost 
Actual Ranks _ nenberg lain Living 
I VII XIII XVII be 
. VIII ee Pe XXI 
III xI XIV XVIII -_- 
V aetecd XV XIX XXII 
VI Racine XVI Sao XXIII 
Exp.1 Exp. 2 Exp. 3 Exp. 4 Exp. 5 
_ 48 48 48 108 6 
5 5 11 15 8 
73 .67 13 52 82 
98 .96 89 95 
.94 91 .78 91 
2.5 are 4.8 8.2 6.5/ 
. 19 ce 28 52 1.4 


Particular cases (as states, or years) that 
are heterogeneous with reference to the val- 
ues in the different traits, will individually 
show greater response to weights than cases 
in the series which are more homogeneous 
in their values. 


Ranking the component traits before 
weighting and combining into an index num- 








INDEX NUMBERS 


ber has only a moderate effect on the place- 
ment of cases in the result. Ranking exerts 
n influence in addition to that of making the 
variability of the series equal. The extent of 
this additional change depends on the shape 
of the original frequency distribution, and it 
is further acted upon by any assigned weights. 

Rank correlation coefficients are substan- 
tially the same as product-moment correla- 
tion coefficients for the data here worked 
with; the rank correlations are slightly lower, 
on the whole. The difference between prod- 
uct-moment correlation coefficients and rank 
correlation coefficients is comparable to the 
difference which occurs in product-moment 
correlation coefficients when calculated with 
ictual values and product-moment correlation 
efficients when calculated from grouped 
data. (The data in support of this point 
were not presented in the report, but appear 
m the work sheets.) 


Equal weighting appears to give results 
somewhere around the middle between the 
worst possible weighting and the best possible 
weighting. 


No differences in index numbers were pro- 
duced by any of the experiments which could 
with assurance be said to exceed the limits of 
inaccuracy that should be allowed for validity 
of the traits selected to represent the general 
oncept, and reliability of the reported data. 


\ set of weights is essentially a set of 
ratios. They may be divided or multiplied by 
a constant without changing the effect on the 
placement of any case in the index number. 


Where the actual magnitude of the result- 
ing index number is important, in contrast 
with simply correct ratios between the val- 
ues, or with rank position, additional factors 
enter, and the situation is somewhat more de- 
manding. Multiplying or dividing the weights 
by a constant, however, will not affect the 
value of the index numbers if they are re- 
ferred to a base that is weighted accordingly. 


Does it pay to weight? The question of 
weighting in any particular case must be de- 
cided some way. Natural (observed) weight- 
ing is largely a product of the units that are 
used, if they differ from trait to trait. Equal 
weighting is arbitrary, and requires a decision 
as much as any other weighting. One may 
venture reasonable weights without fear of 
affecting the results markedly, if the series are 
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well interrelated; and, if he believes that he 
can assign weights that are somewhat better 
than the observed weights or than equal 
weighting, he should do so, taking advantage 
of whatever improvement may result. That 
is, there is little risk in moderate weighting, 
and whatever change there is may reasonably 
be expected to be a favorable one. 


One will not be absolute in his interpreta- 
tion of index numbers, recognizing that many 
acts of judgment necessarily enter into them, 
and that the values yielded are no better than 
the quality of judgment which has acted upon 
them at many points. 


NOTES 


? Douglas E. Scates, ‘‘The General Nature and Applicability 
of Index Numbers for Education,” Journal Experimental 
Education, 1V:265-78. March, 1936 


2 For a brief review of these index numbers, see the refer- 
ence in footnote 1, or the following “Statistical Analyses of 
State School Systems,’’ pp. 104-112; in “Estimating State 
School Efficiency,”’ Research Bulletin of the National Edu 
cation Association, Vol. X, No. 3, May, 1932 

*Complete bibliographical references to the various studies 
by Ayres, Phillips, and Schrammel will be found in the refer 
ences cited in footnotes 1 and 2 


*H. E. Schrammel, and E. R. Sonnenberg, ‘‘The Rank of 
States According to Educational Achievement on the Basis of 
Eleven Selected Criteria,’ American School Board Journa 
43:17-19. November, 1936. 


5 Douglas E. Scates, ‘‘Revised Index Number of State Schoo 
Systems,’’ American School Board Journal, 94:52-53 Tune 
1937 

* “Present Standing of State School Systems on Five Factors 
Related to Efficiency,”” pp. 113-131; im “Estimating State 
School Efficiency,’’ Research Bulletin of the National Edu 
cation Association, Vol. X, No. 3, May, 1932 

TIt will be noted that weights running from 1 to 11 are as 


forceful as weights running from 20 to 220, or any others in 
like proportion 


* Douglas E. Scates and F. R. Noffsinger 
Determine the Effectiveness of Weighting, 
cational Research, 24:280-85 J 


* Within the limits of sampling of the various weighting 
patterns, as previously pointed out. All of the statements in 
this discussion must be interpreted as limited to the cond 
tions underlying the present set of data 


Factors Which 
Journal of Edu 
November, 1931 


In the calculation of coefficients of this type 
given in the following references are helpful 

Herbert S. Conrad, “On the Calculation of the Correlation 
Between a Single Element of a Composite and the Remainder 
of the Composite,” Journal of Educational Psychology, 26 
611-615. November, 1935 

Edwin E. Ghiselli and George Kuznets, ‘‘Short-Cut Methods 
for Calculating Raw and Corrected Correlations Between a 
Composite Variable and Its Components.’’ Journal of Edu 
cational Psychology, 28:237-240. March, 1937 

™ Such a result would not necessarily be the case, since the 
rank correlation coefficient is based on squares of the differ- 
ences, and may drop more rapidly than the average of the 
differences increases. It may even drop when the average 
difference decreases. 


12 Data from the original report (note 4) were used, rather 
than Scates’ revision which was made after the present study 
was well along. Two obvious errors in printing were corrected 
before the data were used, however. 

1% Leo M. Chamberlain, “Measures of Educational Perform- 
ance in the County School Districts of Kentucky.” Bulletin 
of the Bureau of School Service, College of Education, Uni- 
versity of Kentucky. Vol. VI, No. 4, June, 1934 42 p 
Data from Table 1, pages 25-34 

14“‘The Teacher’s Economic Position,”’ Research Bulletin of 
the National Education Association, Vol. XIII, No. 4. Sep- 
tember, 1935. Chap. V “Changes in Cost of Living with 
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A TEST FOR MEASURING TEACHERS’ KNOWLEDGE OF 
THE CONDUCT AND PERSONALITY OF CHILDREN 
FROM SIX TO EIGHT YEARS OF AGE* 

MartHa A. O’DANIEL RINSLAND 


State Supervisor of Federal Nursery Schools and Family Lije Education, 
State Department of Education, Oklahoma City, Oklahoma 


I. INTRODUCTION AND STATEMENT 
OF THE PROBLEM 


The newer trends in education are toward 
the development of the whole child. The 
way in which he learns to adjust to the school 
environment is of more import, and signifi- 
cance to the educator than the teaching of 
specific subject matter. While there is some 
disagreement among recognized authorities as 
to the techniques used in child development, 
there is almost a unanimity of opinion as to 
its importance. The newer conatus empha- 
size the importance of studying the child in 
all his learning situations. 


The becoming aware of something more 
subtle, more intrinsic, more potent, and of 
something of more moment than the three 
R’s has been exceedingly gradual, but quite 
definite. The way in which a child meets 
situations is more important to modern edu- 
cators, psychologists, mental hygienists, psy- 
chiatrists, and parents than speed in reading, 
the number of words in his vocabulary, or the 
accuracy and speed of learning number com- 
binations. The deadening process of “busy 
work” is being replaced by freedom to do 
original, creative work. This stimulates the 
child to do things for himself and to share 
with others. 


Today’s collimations in education are defi- 
nitely toward the development of more stable, 
happy and wholesome childhood, which is the 
foundation of hygienic adolescence and ad- 
justed adulthood. The important question 
asked of elementary teachers about their 
preparation is no longer primarily concerned 
with their grades in subject matter, but with 
their attitudes toward child life, family rela- 
tions, and teacher-parent appositions. Can 
she early detect, diagnose, and treat symp- 
toms in conduct cases? Can she distinguish 
the child from his fault? Can she talk unemo- 


* Summary of a thesis for the Ed. D. degree, University of 
Oklahoma, 1936. 
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tionally to the parent of the “problem child’’? 
Does she recognize the right of every child to 
a happy school environment? Does she see 
the individual's need of being understood on 
his own level? 

The child of this new day is being studied 
and treated in laboratories, clinics, hospitals, 
homes, communities, churches, and on the 
streets. Each specialist is taking one phase 
of the child and learning all that is possible 
concerning it. Biochemists are delving into 
the mysteries of the composite child and what 
effects certain chemicals have on his physical 
growth and emotional stability. Some pedia- 
tricians are studying glands and their influ- 
ence on the personality of the child. Dieti- 
tians are analyzing foods and ascertaining 
what each contributes to the health of the 
child. Mental hygienists are learning the 
causes of mental deviation and are preventing 
them from becoming permanent problems. 
Psychologists are studying how the child 
learns and why he behaves as he does. The 
need of these and other experts in the field is 
exigent. However, it is not under these spe- 
cialists, but under the care of a teacher that 
the child is placed for subject matter learn- 
ing and character building. She must be 
trained to synthesize all known facts for the 
wholesome growth of the child. 

The new subject matter of education is the 
child. His personality is so complex that it 
challenges the best teacher. It takes a mini- 
mum of six years for a physician to learn his 
profession and to be able to minister to the 
physical needs of the child. Years of study, 
both in high school and in college, are re- 
quired before one can teach English. The 
teacher is forced to absorb volumes of “sub- 
ject matter,” whereas only a course or two in 
the difficult and intricate subject of child 
learning is required. However, the present 
trends are leading toward the real subject 
matter—the child as a large book of unknown 
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potentialities—which must be studied as no 
other subject has been or is being studied. 

Just how much do teachers know about the 
child’s growth and learning; about guiding 
and developing his personality? There are 
no tests by which to measure teachers’ knowl- 
edge of these things. This is the problem of 
this thesis. Therefore, the first step in this 
study demanded the construction of a compre- 
hensive test in this field of knowledge. An 
analysis will be made of teachers’ scores in 
the terms of a number of factors contributing 
to their knowledge of child development. 
Such factors will include their college courses 
in education, psychology, child development, 
and years of teaching. 


II. SouRcCES AND STRUCTURE OF THE TEST 


Personality factors are somewhat elusive 
and hard to define. Just what is wholesome 
and what is not wholesome in the children of 
early school years is very difficult to deter- 
mine. The relative importance of the differ- 
ent factors has not been determined by re- 
search. The literature in this field is largely 
psychological, psysiological, and philosophi- 
cal, rather than factural, definite, conclusive, 
and statistical. 

To secure a measurement of the teacher’s 
knowledge of personality potentialities and to 
discover how they are developed is a most 
perplexing problem. A valid test should be 
very comprehensive, but short enough to give 
to teachers whose day is so full that it is im- 
practical to use a test that takes more than 
two hours to complete. Laboratory experi- 
ments cannot be used in this type of study 
since it is largely a subject matter test on the 
teacher’s knowledge of the whole child. 

The test must sample broadly and measure 
accurately the teacher’s knowledge, which she 
has gained through study, observations, and 
experiences in teaching the young child. It 
must not appear to the teacher as a test, but 
as a questionnaire. This technique of meas- 
uring has the value of more nearly getting the 
teacher’s honest reactions to given statements. 
She is not confronted with a rating by her 
supervisor, and neither is she fearful of her 
tenure in the school as determined by her 
frank responses or scores on a test. While 
teachers give many tests, it is well known that 
they do not like to take them. 

In such a questionnaire the answers cannot 
be absolutely objective; therefore, there must 
be a variability of answers expressing degrees 
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of correctness for each situation tested. The 
relative degree of correctness can be deter- 
mined only by pooling the opinion of experts 
in the field. Data from such a questionnaire, 
however, may be considered objective as far 
as the teacher’s impartial answers and the in- 
vestigator’s detached manner of scoring are 
concerned. When the scoring is statistically 
interpreted, the investigator is removed from 
subjective elucidations of the teacher’s knowl- 
edges of the child’s personality. 

The objective form chosen for such a ques- 
tionnaire is the multiple choice, which admits 
of a rating by experts, and also of a choice 
in the best answer by the teacher. Thus de- 
grees of correctness in knowledge and in prac- 
tice can be measured. An odd number of 
possible answers or solutions is desired so that 
extremes may be shown and averages ascer- 
tained. Five responses for each question were 
chosen because they offer a sufficient degree of 
variability of knowledge and of practice. 
Bain’ found that five degrees of teaching abil- 
ity could be accurately judged. In this form 
the questionnaire is also a test. It appears to 
the teacher as a questionnaire from which an 
investigator is compiling teachers’ opinions, 
but in reality it is a test of teachers’ knowl- 
edge, sufficiently objective in nature to make 
a reliable measuring instrument for research. 

General or average practices, and not the 
treatment of individual cases, form the basis 
of each item to be tested. The statements 
are not grouped in categories, as chance place- 
ment more nearly prevents the teacher from 
being influenced by any perceptible order. 
One hundred and thirty-four items were se- 
lected for the test. 

The source material for the elements of this 
test has come from books, magazines,” lec- 
tures of experts in the field of child develop- 
ment, and courses in child psychology, child 
welfare, experimental psychology, general 
psychology, mental hygiene, abnormal psy- 
chology, educational psychology, parent edu- 
cation, teaching, supervising, and family 
relationships. 

These sources may be considered as a cur- 
ricular validation of the test items. The 
items represent statements of problems ac- 
tually occurring in the writings and lectures 


1 Bain, W. E., An ee Ar. Study of Teaching in Nursery 
Scheols, Kindergarten, Grade. New York: Colum- 
bia University, Teachers College, Bureau of Publication, 1928. 

*An extended bibliography of 172 titles, on which this 
article is based, is included in the nal manuscript of the 
thesis on file in the library of the University of Oklahoma. 
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of our best authorities and in situations that 
the writer knows are real in the lives of many 
children. 


The final test consists of one hundred and 
thirty-four statements about children’s con- 
duct and personality development. Each 
statement is followed by five possible answers. 
Teachers were asked to select the one answer 
which they thought to be best. 


The Construction of the Scoring Key 


In a test which does not have right and 
wrong answers, but where degrees of correct- 
ness exist, best answers must be determined 
by the pooled judgment of expert judges. 
Nine judges were chosen from the field of 
psychology, teacher training, and child de- 
velopment. The judges are from five differ- 
ent states and represent college professors, 
writers, and supervisors of teacher training 
and home making. 


A mimeographed copy of the Test-Ques- 
tionnaire was sent to each judge with the fol- 
lowing directions: 

Directions: Read each statement and the 
five answers following the statement. Rank 
the answers in order of your preference; i.e., 
number the best answer 1, the next best 2, the 
next 3, the next 4, and the poorest 5. Write 
the numbers above the answers or in the 
parentheses before the statement. 


The rankings by the nine judges of the five 
responses for each item of the Test-Question- 
naire were summated. These sums were then 
converted into final values for each response 
by assigning five points to the best answer 
and one point to the worst answer, with two, 
three, and four representing the intermediate 
values. Since there were nine judges and five 
responses rated by each judge from one to 
five points, the totals for a perfect agreement 
of judges for the five items would be: first or 
best, 9 points; second, 18 points; third, 27 
points; fourth, 36 points; and fifth, 45 points. 
These total points were converted to final val- 
ues of 5 (best), 4, 3, 2, and 1. 


Perfect agreement of judges was not ex- 
pected in many items. A difference between 
the totals for the successive items must be of a 
certain magnitude before it becomes signifi- 
cant; that is, before it will serve as an ade- 
quate basis for placing the one item above the 
other. A difference which would indicate a 
fair or acceptable degree of agreement should 
be numerically more than half the distance 
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between any two sets of perfect totals. This 
is 4.5 points, but no decimal can be taken as 
a significant figure, as all ratings were in 
whole numbers. Besides, ratings were more 
or less subjective and in many items two re- 
sponses, especially after the best choice, were 
of almost equal value to a judge. Therefore, 
a difference of 4 points was taken as sufficient 
evidence of agreement on near responses. 
Where a difference of less than 4 points ex- 
isted between close responses, the two re- 
sponses were considered to be of equal value 
and they received a value which was the aver- 
age of the two values which would have been 
given had they been clearly differentiated in 
the responses. Items not answered and 
marked “x” were scored zero since the direc- 
tions instructed teachers to mark “x” any item 
not known. Some sample items of the test 
with score values are given here; and Table I 
summarizes the ratings for all items. 


Sources of Teachers’ Responses 


To ascertain the knowledge of teachers 
concerning the personality and conduct of 
children, a sampling was made in nine cities 
of three states in the southwest. Two hun- 
dred and fifty-two papers were returned com- 
pletely answered. This is an eighty-two per 
cent response, which is a very high percent- 
age in the light of the usual returns. 


The study had to depend largely upon vol- 
untary response of the teachers to the request 
of the superintendent, principal, or supervisor. 
The questionnaire was long, nine pages of 
multiple choice items, and one page of general 
information relative to the teacher’s training 
and experience. It was a request indeed to 
ask teachers to answer such a long question- 
naire. A shorter one could have been sent to 
a larger number of teachers, but since the first 
object of the sampling was to obtain the sub- 
ject matter of the test, it had to be rather 
inclusive. 


The first page included name and age of the 
person taking the test; present position; years 
of teaching; years of teaching children from 
six to eight years of age; number of college 
years; major study, number of courses in 
child psychology, child welfare, family rela- 
tions, and parent education; number of 
courses in general and educational psychol- 
ogy; and number of courses in general 
education. 
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SAMPLE OF ITEMS FROM THE TEST WITH SCORE VALUES 
The same type of friendly relationships existing among children should exist St 
between children and teachers. 1 always 2 usually 3 sometimes 4 rarely 
5 never. [1 


Criticisms of children by their equals are more effective than those by their 
teachers. 1 always 2 most of the time 3 rarely 4 never 5 harmful. 


In dealing with children it is essential to find the motive back of the reaction. 
1 of great moment 2 expedient 3 of some moment 4 of little importance 
5 of no significance. 


To develop individuals who are willing to accept the consequences of their 
acts is the best type of teaching. 1 undoubtedly 2 usually 3 sometimes 
4 seldom 5 not at all. 


Compromise between conformity and individualization, should become more 
efficient and pleasant for children. lalways 2usually 3sometimes 4 rarely 
5 never. 


While the child is particularly negativistic, small and unimportant issues 
should be avoided. 1 yes 2asarule 3 sometimes 4 seldom 5 no. 
The standardized program develops all children equally. 2 often 


1 certainly 
3 sometimes 4 rarely 5 not at all. 


Children learn more readily when they are distinguished from their faults 
1 always 2 usually 3 sometimes 4 rarely 5 no. 


The happy child tends toward the practice of masturbation. 


2 3 somewhat 4 very seldom 5 not at all. 


1 decidedly 
2 usually 


Where teachers are fearful of their rating as teachers, the children’s behavior 
responses reflect this anxiety. 1 absolutely 2 usually 3 somewhat 4 very 
little 5 not at all. 


The genetic approach on the part of the teachers insures happier teacher- 


pupil relationships. 1 undoubtedly 2 very little doubt 3 doubtful 4 slightly 
valuable 5 of no value. 
Emotional disturbances cause children to do inferior work. 1 always 2 fre 


quently 3 occasionally 4 very rarely 5 never. 
Under difficulty, fear, or anxiety, speech defects become more pronounced. 
lasarule 2 frequently 3 occasionally 4 rarely 5 no. 


Economic conditions in the home have effect upon the attitude of the children 
in school. 1 as a rule 2 frequently 3 occasionally 4 in a few instances 
5 never, 


The essential thing in developing the poy of the young child is to give 
him factual information. 1 decidedly 2 to a large degree 3 sometimes 


4 only incidental 5 no. 


Understanding the child as an individual and his relationship to the group is 
paramount in successful teaching. 1 yes 2 very important 3 somewhat 
impor-ant 4 of slight importance 5 not essential. 


*In square brackets are the value for the respective responses in order as given in each 


item; that is, the first number is the value for response number 1, the second number is the 
value for response number 2, and so forth. 
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TABLE I 


rist SuM OF RATINGS OF NINE JUDGES ON FIVE RESPONSES FOR EACH ITEM OF THE TEST AND FINAL 
eh; VALUE OF RESPONSES FOR EACH ITEM 


[tem Sum of Ratings on Each Response Final value 
1 2 3 5 1 2 3 4 5 
leir 30 15 19 30 41 2.5 5 4 2.5 l 
2 12 16 26 36 45 5 4 3 2 1 
13 17 24 36 45 5 4 3 2 l 
0 4 20 10 26 35 45 4 5 3 9 l 
nce 5 42 22 13 23 35 l 3.5 5 5 2 
6 33 24 17 22. 39 2 3.5 5 3.5 ! 
’ 45 36 27 14 13 l 2 3 45 4.5 
ei) 8 36 10 17 30 42 2 5 4 3 l 
mn 9 45 36 27 16 11 2 3 4 5 
1 9 18 27 36 45 5 4 , 9 1 
1] 28 11 17 34 45 3 5 4 9 1 
12 33 19 13 29 41 2 4 5 3 ] 
; 13 30 13 14 34 14 2 4.5 4.5 } l 
14 39 30 21 11 2° l 2.5 4 5 2.5 
15 29 12 19 33 42 3 5 4 2 1 
lf 23 11 20 36 45 3.5 5 3.5 2 l 
ues f 31 14 14 32 44 2.5 4.5 4.5 2.5 l 
18 14 11 23 32 40 4.5 4.5 3 2 ] 
19 ys 4 14 20 35 44 3.5 5 3.5 2 ] 
ter 20 26 13 20 31 45 3 5 4 2 1 
21 38 29 13 19 36 1.5 3 5 4 1.5 
22 35 13 18 29 40 2 5 4 3 l 
lt< 23 32 12 15 33 43 2.5 4.5 4.5 2.5 ] 
24 22 12 20 36 45 3.5 5 3.5 2 | 
95 23 13 19 35 45 3 5 4 2 l 
= 26 23 14 18 28 7 3 5 4 2 l 
al 27 35 i) 18 30 3 2 5 4 3 l 
28 23 18 22 29 43 3.5 5 3.5 2 l 
29 42 31 17 18 27 l 2 4.5 4.5 3 
rior 30 30 12 18 32 43 2.5 5 4 2.5 ] 
ery 31 41 28 12 20 34 l 3 5 4 2 
32 45 35 24 12 19 1 2 3 5 4 
33 43 33 17 12 30 l 2.5 4 5 2.5 
er 34 40 28 i9 12 21 2 3.5 5 3.5 
tls 35 45 35 24 15 16 1 2 3 4.5 4.5 
: 36 15 16 23 36 45 4.5 1.5 3 2 ] 
i 12 19 26 33 45 5 4 3 2 l 
. S 12 18 17 34 41 3 1.5 4.5 2 l 
; 39 42 30 23 17 23 1 2 3.5 5 3.5 
40 40 32 19 14 15 ] y 3 4.5 4.5 
4} 45 36 25 15 14 l 2 3 4.5 4.5 
a 42 45 36 27 14 13 1 2 3 4.5 4.5 
43 45 36 27 17 10 1 2 3 4 5 
44 36 26 23 17 12 1 2.5 2.5 4 5 
ren 45 45 36 27 16 11 l Z 3 4 5 
ices 46 45 29 18 13 31 1 2.5 4 5 2.5 
47 8 16 24 32 40 5 4 3 2 l 
48 44 34 21 15 21 1 2 3.5 5 3.5 
rive 49 45 36 27 15 12 1 2 3 4.5 4.5 
mes 50 40 34 24 14 23 1 2 3.5 5 3.5 
51 43 33 24 14 21 1 2 3.5 5 3.5 
52 37 20 16 21 41 2 3.5 5 3.5 l 
= 53 29 10 17 34 45 3 5 4 2 1 
= 54 9 18 27 36 45 5 4 3 Ss a 
hat 55 44 35 25 1 14 1 2 3 4.5 4.5 
56 39 27 15 19 35 1 3 5 4 20 
57 45 36 26 15 13 | 2 3 4.5 4.5 
ach 58 16 13 25 36 45 4.5 4.5 3 2 1 
the 59 15 14 25 36 45 4.5 4.5 3 2 1 
60 13 18 26 33 45 5 4 3 2 1 
61 28 14 20 33 40 3 5 4 2 l 
62 25 15 15 36 44 3 4.5 4.5 2 1 
63 45 36 23 12 19 1 2 3 5 4 

















Sum oF RATINGS OF NINE JUDGES ON FIVE RESPONSES FOR EACH ITEM OF THE TEST AND FINA 
VALUE OF RESPONSES FOR EACH ITEM 
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2 
18 
16 
36 
24 
17 
35 
18 
14 
13 
17 
36 
16 
11 
32 
17 
36 
35 
34 
29 
20 
36 
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27 
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TABLE I—Continued 
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TABLE I—Continued 


Som oF RATINGS OF NINE JUDGES ON FIVE RESPONSES FOR EACH ITEM OF THI 


TEST AND FINAI 


VALUE OF RESPONSES FOR EACH ITEM 


Item Sum of Ratings on Each Response 
1 2 3 4 

127 16 18 24 34 
128 45 34 25 19 
129 45 36 27 17 

30 41 34 27 19 
131 9 18 27 36 
132 11 16 27 36 
133 13 18 25 34 
134 31 17 10 35 


III. ANALYSIS OF TEACHERS’ SCORES 


lhe responses for each item of the 252 tests 
were scored by the key of values or answers 
previously described. The values of the 134 
items were summed to obtain a total score on 
the test. The frequency distribution of the 
total scores with the mean, the standard devi- 
tion, and the standard error of the mean is 
given in Table II. 

The best unit for comparison of a teacher’s 
ratings with the judges’ ratings is an average 
ind not a total score. Therefore, for each 
test, the total score was divided by the num- 
ber of items, 134, to obtain the mean value 
core. The maximum obtainable total score 
would be from a test with perfect agreement 
with the key. This would be a 643.2 total 
score, or a mean value score of 4.8. The fre- 
quency distribution of the mean value score, 
with the mean, the standard deviation, and 
standard error of the mean value scores is 
given in Table III. Hereafter the scores used 
will be the mean value scores and not the total 
scores. These mean value scores will be 
called the “scores.” 


These distributions show close approxima- 
tion to the normal curve for 252 cases. This 
is an index of both validity and sampling. If 
the distribution of scores from a large unse- 
lected group shows close approximation to a 
normal curve, it is evidence of discrimination 
of abilities which may be assumed to exist 
naturally. Discrimination is an index of 
validity and reliability, validity being inferred 
from approximation to normal distribution, 
and reliability being inferred from the wide 
range of total scores (from 425 to 580). The 
difficulties of the test items are well distrib- 
uted if the distribution of scores approximates 
normality for a group assumed as being nor- 


Final value 


1 2 3 4 5 
4.5 4.5 3 2 1 
l 2 3 4 5 
l 2 3 4 5 
1 2 3 4 5 
5 4 3 2 1 
5 4 3 2 1 
5 4 3 2 1 
3 4 5 2 1 
TABLE II 
FREQUENCY DISTRIBUTION OF TOTAL SCORES 
Score 
580-584 _______- 
575-570 


570-574 
565-564 acca : = 
Ere iiaeieaiabiedie 9 


550-554 .______-_ 


wera i 11 
Sf See eee fi eee eore 13 
EE Sina na ctaceineiageati ‘ 15 
535-539 .........- said 13 
| EE errr ne ae ene 15 
525-529 ___-. a 23 
2 2 St he eae 12 
515-519 _ SPRITE 3 16 
610-314 ........... pri annanetearatn . 14 
505-509 __ in tilak bites ts Colincmica anna 16 
FREI 11 
495-499 _............. “ 11] 
a re icekecihae dread 18 
485-489 _______- a ——— 3 
480-484 ......... ee phickaamelt ae 
IIIT” = scnne. Gontsgnin siuckecdeehiicce bbakabaies soaks 3 
oe rere cdiqrenebeaeienied 2 
2, SS a Pee eeae eon 5 
GE cacacetincwon nolan dil ubiiditecoede iia 5 
RNY 0. ciainasineicncommsii aidan daunbe = 0 
Ee eer a 1 
Se ee ee ney 2 
PE a A eiceeeantiancoene aikemeaginal oceans 0 
EEE . 0 
a ee 0 
EE cia wiesuadiarewteberiadiaces = 3 
Total ied naira tnikeisasmberesiniandmneaainte . 252 
ee iuinikcs i tcl an dated 517.4 
SINE 5, sacdiieanieds saan inner ened ia nan degnmeatideedianan 29.7 
SET xxcusenctnischsnnabecechibemadinininetaainans seieieodai 1.9 
mal. Monroe* says, “This index of validity 


refers to the differentiation of scores for 
pupils possessing different degrees of ability. 
It is obvious that any lack of objectivity or 
reliability will result in a lack of discrimina- 
tion for certain pupils.””’ Monroe and Engel- 

* Monroe, Walter S., An Introduction to the Theory of Edu- 


cational Measurements, Houghton Mifflin Company, 1923, p. 
219. 
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TABLE III 


FREQUENCY DISTRIBUTION OF MEAN SCORES 


Score f 
1.4 ; ; oa =e 1 
1.3 — sate ) 
1.2 : asian se sais Siete i 24 
4.1 es - . 32 
1.0 32 

g ao 
7 +5) 
6 15 
5 11 
: 

Tota . - ~ 251 

Mean Scat i ee 3.88 

S.D. _ Be 

S.D.« mane “ ascent 01 


hart’ say, “A valid test reveals differences in 
ability that exist and one that fails to reveal 
a difference which is at all marked is distinctly 
lacking in this quality.” Validity may be in- 
ferred from the joint presence of curricular 
validity and high reliability. 

The reliability of the test was established 
by correlating even with odd numbered items 
and correcting by the Brown—Spearman 
Prophecy Formula. The correlation (7,.) is 
.854 + .012; reliability coefficient (7,,) .922 
+ 013; standard error of a raw score, 
( o\/1—r,.) 8.38. The high reliability, the 
probable high discrimination, and the validity 
of the test make it a valuable measuring in- 
strument for determining the true knowledge 
teachers have of the character and personality 
of children from six to eight years of age. 

Certain facts about each teacher’s training 
and experience were obtained by a short ques- 
tionnaire attached to the first page of the test. 
Two methods of interpreting the meaning of 
scores in the light of such contributing factors 
are possible. ‘lhe first is the method of cor- 
relation analysis. This method presents dif- 
ficulties when correlations are low. From ob- 
servation and tabulation of scores and con- 
tributing factors the corrections appear to be 
low. Odell® shows that when correlations are 
low, interpretations of the coefficient “. 
through approach to or departure from per- 
fect correlation” are not valuable nor accur- 
ate. This method will not be used. The sec- 

* Monroe, Walter S. and Engelhart, Max D., The Scientific 


Sindy of Educational Problems, The Macmillan Company, 
1936, p. 181. 


* Odell, Charles W., Statistical Methods in Education, D 
Appleton—Century Company, 1935, pp. 192-199. 
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ond method compares the mean score of 4 
high group, the mean score of a middle group 
and the mean score of a low group. If the 
differences of means are slight, correlations 
would be low. The upper quarter will be 
taken as the high group, the middle half wil! 
be taken as the middle group, and the lower 
quarter will be taken as the low group. Each 
of the factors which might influence scores 
will be analyzed by comparisons of the means 
of the three groups. 

Normally, one would expect the number of 
years in college to have a positive effect on 
teachers’ knowledge of children’s personality 
and conduct. The range of years in college 
was divided into three groups and the mean 
score of each of the three groups calculated 
The data are presented in Table IV. There 
is a slight increase in scores from the low 
group to the high group. The difference be- 
tween the lower quarter and middle half is 
.o2, and the difference between the middle 
half and the upper quarter is .18. One dif- 
ference is not significant, the other is signifi- 
cant. When these differences are converted 
into total scores they are, respectively, 2.58 
and 24.12 (.02 K 134 = 2.68 and .18 & 134 
— 24.12). Three points difference in a test, 
where the maximum total score is 643.5 
points, is not significant. However, 24 points 
are 4/5 (2497.7) of a sigma and are sig- 
nificant. The usual technique of statistical 
significance of difference of means is not ap- 
plied here as the purpose of these data is not 
primarily to generalize such differences for 
the total population, but to show differences 


TABLE IV 


MEAN SCORE OF TEACHERS IN THE UPPER 
QUARTER, MIDDLE HALF, AND LOWER QUARTER 
OF YEARS IN COLLEGE 


Number of Number of Mean 


Years Teachers Score 
en 29 4.06 
7 ER een ers 195 3.88 
I ac 21 3.86 


in the groups compared. The overlapping of 
the total distributions of the group is very 
large. In fact every group has cases of ex- 
tremely low and extremely high scores. Lin- 
coln’ has shown that the significance, by the 
usual method of significance of differences of 
means, is often exaggerated when viewed by 


* Lincoln, Edward A., “The Insignificance of Significant Dif- 


ferences,’’ Journal of Experimental Education, March, 1934, 
pp. 288-290 
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the overlapping of total distributions. These 
factors were considered and the usual inter- 
pretations of significance rejected as not con- 
tributing to an understanding of the data. 
These small differences of means indicate very 
low correlations, and support the decision to 
reject the method of correlation analysis. 


Courses in psychology and education should 
have appreciable positive effects on scores on 
the test. The data of the three groups, in 
terms of the number of courses, are given in 
Table V. The differences here are also slight. 
The difference from the low group to the mid- 
dle group is .o1, the difference from the mid- 
lle group to the high group is .o8, and the 
differences are not in favor of amount of 
training. It seems that after a teacher has 
had a number of courses there is no improve- 
ment in score by the addition of courses. 
Perhaps, semester hours would have been a 
more accurate measure of training, but such 
accuracy would not have given more signifi- 
cant differences. 


TABLE V 


MEAN SCORE OF TEACHERS IN THE UPPER QUAR- 
TER MIpDLE HALF, AND LOWER QUARTER OF 
COURSES IN PSYCHOLOGY AND EDUCATION 


Number of Number of Mean 


Courses Teachers Score 
INI sti tp stig de inca 5 3.81 
aE 169 3.89 
ee eae 72 3.90 


If courses in psychology and education do 
not show positive appreciable effects on the 
test, perhaps courses in child development do. 
The data for these courses are in Table VI, 
and show only slight differences. The differ- 
ence between the low group and the middle 
group is .o8. No difference between the mid- 
dle and the high group is evident. These 
data raise a serious question concerning the 
subject matter of these courses. 


TABLE VI 


MEAN SCORE OF TEACHERS IN THE UPPER QUAR- 
TER MippLe HALF, AND LOWER QUARTER OF 
COURSES IN CHILD WELFARE 


Number of Number of Mean 

Courses Teachers Score 
I a a a a 13 3.92 
a ae ee 184 3.92 
Ot Secs eit een 41 3.84 
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If courses in college do not correlate with 
scores on this test, perhaps experience does. 
Table VII gives data showing the means of 
three groups based upon the number of years 
teaching. The middle group, teachers who 
have taught from 12 to 31 years, make the 
highest score, though it is only .03 points 
higher than teachers who have taught from 
2 to 11 years, and only .03 points higher than 
teachers who have taught more than 31 years. 
There are only g teachers in the 32-42 group, 
as few teachers in public schools have taught 
more than 31 years. 


TABLE VII 


MEAN SCORE OF TEACHERS IN THE UPPER QUAR- 
TER MIDDLE HALF, AND LOWER QUARTER OF 
YEARS TEACHING 


Number of Number of Mean 


Years Teachers Score 

Oe 9 3.80 

eer ae 149 3.87 
es See iasiliadieamata 91 3.84 


The test measures teachers’ knowledge of 
young children, therefore teaching young chil- 
dren should indicate more knowledge or higher 
scores on the test. Table VIII shows that 
teachers who have taught from 12 to 31 years 
make .o6 points more than teachers who have 
taught from 2 to 11 years, and .o4 points 
more than those who have taught from 32 to 
42 years. These differences are not large. 


TABLE VIII 


MEAN SCORE OF TEACHERS IN THE UPPER QUAR- 
TER MIDDLE HALF, AND LOWER QUARTER OF 
YEARS IN TEACHING YOUNG CHILDREN 


Number of Number of Mean 
Years Teachers Score 
a y 3.87 
a i a 112 3.91 

| Ee re eens 127 3.85 


Do teachers who major in education make 
higher scores on this test than teachers who 
major in other departments? They make 
slightly higher scores as is shown in Table IX. 
The difference is .o4 in favor of education 
majors. 

TABLE IX 
MEAN SCORE OF EDUCATION MAJoRS AND MEAN 
Score OF ALL OTHER MAJors 
Number of Mean 
Record Teachers Score 


Educational Majors ____---- 151 3.83 
Other MAGUS .........+... 91 3.79 
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In general no factors were found which 
seriously affect scores on the test. Differ- 
ences though small do exist in a number of 
factors. However, they might have consid- 
erable effect in teaching. It is difficult to in- 
terpret the real meaning of these differences. 
Statistical differences may not answer the 
questions involved. Sampling may have 
played an important role. These teachers 
were undoubtedly a select group. They teach 
in the best schools in this section of the coun- 
try and were selected by their administra- 
tive officers to assist in this study. This 
makes the range of scores limited and differ- 
ences not statistically significant. Intelli- 
gence per se was not measured. It may be 
that the determining factor, correlated with 
test scores, is intelligence. If grades from all 
colleges could have been secured and reduced 
to comparable units, they might have revealed 
significant differences. Further investigations 
may yield significant relationships. 

In order to determine how much teachers 
knew about each item of the test, all teachers 
scoring 4 or more were credited with consid- 
erable knowledge and all teachers scoring 3.5 
or below were credited with little knowledge. 
Of the five possible responses to each item 
only the first two choices really represent 
worthwhile or desirable solutions. These 
credit values include 5, 4.5, and 4. Chance 
responses would probably yield an average 
rating of 3. 

On close examination of the items best 
known or missed, the investigator found no 
specific grouping of a particular area of sub- 
ject matter known best or least known by the 
teachers who assisted in the study. This is 
not unlike knowledge in other fields of sub- 
ject matter. 

Any practical use of a test or scale requires 
norms. A sampling of 252 cases in better 
schools furnishes tentative norms. Two 
types of norms are given in Table X. One of 
the oldest ideas of expressing efficiency is the 
percentage scale. The maximum score on the 
test is 634.5. If this is divided into each 
score and multiplied by roo, it will give a 
percentage score. A second and more recent 
norm is the percentile score. A percentile 
score is a statement of a score in terms of a 
relative or percentile position in the distribu- 
tion of the whole group. Any percentile for 
a score shows the percentage of cases making 
this score or less. Zero is the lowest per- 
centile and 100 is the highest. It is widely 
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used and is one of the most useful norms for 
tests in high school, college, and vocational 
groups. 


IV. CONCLUSIONS 


Before teachers’ knowledge of the conduct 
and personality of children could be ascer- 
tained, a test had to be constructed which 
would measure objectively such knowledge 
The objective type chosen is the multiple 
choice form, with five possible choices for 
each item of the test. A key of answers to 
the 134 items of the test was derived from 
the pooled opinion of nine experts in the 
fields of education, psychology, and child 
development. 

After examining the distribution of scores 
of the 252 teachers from nine of the better 


TABLE X 


PERCENTILE SCORE AND PERCENTAGE SCORE* FOR 
THE TOTAL SCORES ON THE TEST 


Total Percentile Percentag: 

Score Frequency score score 
580-584 —__ 1 99.6 92.0 
575-570 : 4 99.2 91.3 
570-574 ___ 1 97.6 90.5 
565—569 __- 7 97.2 89.7 
560-564 ___ & 94.4 88.9 
555-559 ___ 9 91.3 88.1 
550-554 _.. 11 87.7 87.3 
545-549 ___ 18 83.7 86.5 
540-544 __. 15 78.6 85.7 
535-539 __- 3 72.6 84.9 
530-534 __ 15 67.5 84.2 
525-529 __. 23 61.5 83.4 
520-524 _.. 12 52.4 82.6 
515-519 ___ 16 47.6 71.8 
510-514 _.. 14 41.3 81.0 
505-509 ___ 16 35.7 80.2 
500-504 _.. 11 29.4 79.4 
495-499 ___ 11 25.0 77.6 
490-494 ___ 18 20.6 77.9 
485-489 ___ 3 13.5 77.0 
480-484 _._ 10 12.3 76.3 
475-479 ___ 3 8.3 75.5 
470-474 ___ 3 7.1 44.7 
465-469 ___ 5 6.3 74.0 
460-464 —___ 5 4.4 73.1 
455-459 _.. 0 3.6 72.3 
450-454 ___ 1 2.4 71.6 
445-449 ___ 2 2.0 70.7 
440-444 ___ 0 1.6 70.0 
435-439 ___ 0 1.6 69.2 
430-4384 ___ 0 1.6 68.4 
425-429 __. 3 1.2 67.6 
 ———— 252 


“Percentage score is the maximum score 
from the judges pooled opinion into each total 
score. As total scores are grouped, the upper 


limit of each step interval is taken as the 
numerator of the percentage fraction. 
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schools of three states of the Southwest, the 
test was found to be diagnostic because of its 
high discrimination, high curricular validity, 
and high reliability. 

Some pertinent questions were raised from 
facts that were revealed by the test. The 
number of years in college is significant. The 
type of courses taken and the number of years 
teaching show only a slight positive influence 
yn the scores made on the test. 

Colleges are, perhaps, not making student 
teachers cognizant of the child as the dynamic 
subject matter in education and child psy- 
chology. Colleges are, perhaps, not teaching 
potential teachers to recognize and diagnose 
the conduct and personality of children. Col- 
leges are, perhaps, failing to inculcate teach- 
ing knowledge and techniques in terms of the 
whole child. 

The investigator found no specific grouping 
of a particular area of subject matter known 
best or least by the teachers who assisted in 


this study. The usual wide range of indi- 
vidual differences found in other tests was dis- 
closed in this test. This is not unlike knowl- 
edge in other fields of subject matter. Since 
no data were found which accounts for these 
variations, the investigator is lead to believe 
that a high degree of intelligence is a prime 
factor in transferring college training and 
teaching experiences into integrated knowl- 
edge of the conduct and personality of chil- 
dren from six to eight years of age. The 
kinds of information measured by this test 
certainly measure integrated knowledge. 

The test is inclusive in what it purports to 
measure; therefore it might be used advan- 
tageously in teacher training institutions. The 
practical use of a test or scale requires norms. 
Tentative norms in the forms of percentiles 
and percentages are present. They may be 
used by professors of child psychology and 
child development and by school executives 
in their selection of teachers of young children. 














THE NATURE OF THE ABILITIES REQUIRED IN THE SURVEY 
COURSES OF THE CHICAGO CITY JUNIOR COLLEGES* 


Max D. ENGELHART 


Department of Examinations 
Chicago City Junior Colleges 


Introductory Statement 

A chief objective of the Chicago Junior 
Colleges is to provide the means whereby all 
students can acquire a general education; 
hence, the curriculum of the colleges includes 
survey courses in the following fields: Eng- 
lish composition, the humanities, biological 
science, physical science, and social science. 
The typical freshman enrolls in English com- 
position, the first year of social science, bio- 
logical or physical science, and a number of 
elective courses. The typical sophomore en- 
rolls in the second year of social science,’ 
the humanities, biological or physical science, 
and a number of elective courses. These sur- 
vey courses are taught largely by means of 
lectures. One class hour per week in each is 
devoted to discussion, recitation, and testing. 
Achievement is measured at the close of the 
year by means of three-hour comprehensive 
examinations prepared by the Department of 
Examinations with the cooperation of the 
faculties concerned. The mark for the year 
of study is wholly dependent on the degree of 
attainment of the student on the compre- 
hensive examination. 


The Problem of This Investigation 

It may be assumed that recognition of the 
attainment of general education as a chief 
objective of the city junior colleges specifies 
in itself a restriction on the nature of the abil- 
ities required for successful achievement in 
the survey courses. Acceptance of this ob- 
jective implies that the curriculum materials, 
methods of instruction, and methods of meas- 
urement shall not necessitate the utilization 
of special talents by the students. In other 
words, achievement in each of the surveys 
should be largely dependent on capacities 
which all students are likely to have, or abili- 

* Mr. Hugh B. Lewis, a graduate student of the University 
. en assisted the writer in the statistical treatment of 


_A recent change in policy has reduced the requirement in 
social science from two years to one year. The data used in 
- study were collected while the two-year requirement was 
in force. 
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ties which all students are likely to acquir 
in varying amount. The student of average 
general competence and, hence, capable of se- 
curing satisfactory marks in some of the sur- 
vey courses, should not fail to secure a satis- 
factory mark in any one of the survey courses 
because of a need of unique abilities which he 
does not possess. 


It is the purpose of this study to determine 
the extent to which the abilities required in 
the survey courses are general and the extent 
to which they are unique to each. It is not 
the problem of the investigation to identify 
or to name the abilities so classified. Some 
suggestions may be made, however, regarding 
possible names of these traits, with the qual- 
ification that these efforts at identification are 
not based on the quantitative data reported. 


It is sufficiently important to establish the 
truth of the hypothesis that the individual 
differences in achievement are largely due to 
abilities common to all of the survey courses. 
Such a conclusion has significant application 
in efforts to solve the practical problems of 
curriculum construction, methods of instruc- 
tion, and personnel work in the colleges 


The Sources of Data 

The original data used in this study are 
scores on the six comprehensive examinations 
and on an intelligence test, the Psychological 
Examination of the American Council on Edu- 
cation. Two samples of data were used. 
The first sample consisted of the scores on 
each of the six comprehensive examinations of 
100 students who had taken all six of the ex- 
aminations. The second sample consisted of 
the scores of another group of roo students on 
each of the six comprehensive examinations 
and on the intelligence test. All three col- 
leges were represented in each of the samples. 
Both samples were chosen in a random fash- 
ion. The only departure from random selec- 
tion was the rejection of cases where the data 
were incomplete. 
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The Techniques Used 

The techniques used in analyzing the data 
will be described in later paragraphs. It 
mav be mentioned here that the original data 
were used in the calculation of Pearson prod- 
uct-moment coefficients of correlation between 
all possible series of paired scores for both of 
the samples referred to above. Fifteen coeffi- 
ients of correlation were calculated from the 
first sample of original data. Twenty-one 
coefficients of correlation were calculated from 
the second sample of original data. Both sets 
were subjected to two methods of analysis in 
efforts to determine the extent to which the 
abilities underlying the correlations are gen- 
eral and the extent to which they are unique 
to the survey courses concerned. It was felt 
that the use of two independent samples and 
of two methods of analysis would more ade- 
quately support the conclusions derived from 
the study. 


Limitations of Data and Techniques 


It is probable that the comprehensive exam- 
inations do not measure all of the desirable 
changes made in students as a result of in- 
struction in the survey courses. The attain- 
ment of a general education should mean that 
the students have acquired not only knowl- 
edge and the ability to apply knowledge in the 
survey fields, but also attitudes, ideals, and 
interests. The comprehensive examinations 
measure effectively the extent to which knowl- 
edge has been acquired. They measure some- 
what less effectively and somewhat problemat- 
ically the extent to which the students are 
able to use knowledge in their thinking in 
the fields concerned. They do not measure 
directly the extent to which attitudes, ideals, 
and interests have been engendered. There 
is, of course, the possibility of indirect meas- 
urement because of the correlation of abili- 
ties. If the measurement is restricted to 
knowledge, and the students who have knowl- 
edge in greater amount also have the less tan- 
gible traits in greater amount than the stu- 
dents of lesser knowledge, then the tests meas- 
ure the intangible traits indirectly. It may 
be that the scores on the tests are in part a 
result of the possession of these traits in vary- 
ing degree. It may be that the general abil- 
ity, or abilities, later measured but not identi- 
fied from the data are inclusive of these in- 
directly measured traits. It is possible that 
the unique abilities are in part these less tan- 
gible, but decidedly desirable, traits. This 
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limitation does not, however, significantly re- 
duce the dependability of the conclusions de- 
rived from the quantitative data, although it 
does restrict the dependability of the more 
speculative inferences concerning them 


The samples of 100 students each are some 
what limited in size. The use of two samples, 
however, represents an effort to meet a pos- 
sible criticism of inadequacy of data from the 
standpoint of sampling. The examinations 
are not perfectly reliable instruments, but re- 
liability determinations, not reported here, 
have indicated comparatively high reliability. 
The coefficients of reliability usually obtained 
for the comprehensive examinations are above 
.go and, in some cases, as high as .97. In 
given series of scores on the comprehensive 
examinations, the scores do not all refer to the 
same edition of the examination. However, 
the fact that the derived scores were used in 
calculation meets this limitation in large meas- 
ure. Departures of scores from perfect com- 
parability would probably tend to reduce the 
sizes of the coefficients of correlation obtained. 
The indices of the general abilities would be 
reduced in size on this account. These in- 
dices are so large, however, that this data 
fault cannot be regarded as of much 
significance. 


The necessity for using data complete with 
respect to six examinations for each student 
means that the samples are not altogether 
representative of the college populations. Se- 
lection has operated, causing the groups to be 
somewhat more competent than the student 


body as a whole. However, the operation of 
such a factor of selection should have tended 
to reduce the correlations, since coefficients 
obtained from restricted ranges of talent are 
usually smaller than those obtained from 
wider ranges. The fact that the correlations 
between the scores on the comprehensive ex- 
aminations are high in spite of this limitation 
indicates that it is possibly not a serious one. 
It may be that possession of the general abil- 
ity is itself a factor in persistence in attend- 
ance for two years of junior-college instruc- 
tion, or conversely, that the general ability is, 
in part, the trait “perseverance.” 

The techniques used in analysis of the data 
were devised in efforts to identify and meas- 
ure human abilities. They are not appropri- 
ate as means of identification of abilities when 
the traits measured are very complex in na- 
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ture. It is obvious that the comprehensive 
examinations and the intelligence test are com- 
plex in that they measure varieties of traits. 
However, as has been indicated in the defini- 
tion of the problem of this study, the purpose 
is not to identify the general and unique abili- 
ties from the data of this study. The prob- 
lem is to determine the extent to which abili- 
ties are common to the survey courses, and 
the extent to which they are unique. Any in- 
ferences respecting the nature of the abilities 
are admittedly speculative. The inferences 
are made only because they may serve to stim- 
ulate further investigations in this field, and 
investigations in which the variables analyzed 
are less complex in character. This is in a 
sense a pioneer study —an exploratory in- 
vestigation. 


A Description of the Analytical Techniques 
The principle that, when two phenomena 
vary concomitantly, one of the phenomena is 
the cause of the other, or they are both due 
to a common cause, was expressed by John 
Stuart Mill® several years before the idea 
of correlation and its measurement by means 
of a coefficient was developed by Sir Francis 
Galton.* The calculation of coefficients of 
correlation was made more precise by the de- 
velopment of more rigorous mathematical 
techniques by Karl Pearson.” His formula 
N 


r is used in a modified but mathe- 


is Neyoy 
matically equivalent form in the calculation 
of the correlation coefficients analyzed in this 
study. Pearson and Yule may be credited 
with the development of partial and multiple 
correlation,® although other statisticians have 
increased our knowledge of the theory and 
have devised more economical methods of 
calculation. In recent years much use has 
been made of these techniques in efforts to 
determine the relative and combined contribu- 
tions of several variables considered as causes, 
to another variable considered to be the effect 
of the given variables. It has been shown 
that these techniques have certain serious 


* Thurstone, L. I “Current Misuse of the Factorial Meth- 
ods,"’ Psychometrika, 2:72, March, 1937. 

_*Mill, J. S$ i System of Logi New York Longmans 
Green and Company, 1906, p. 263. This volume was first 
published in 1843. 

* About 1875 See Walker. Helen M Studies in the 
History of Statistical Method. Baltimore: The Williams and 
Wilkins Company, 1931, p. 103-106. 

5 Pears m. Karl. ‘Mathematical contributions to the Theory 
of Evolution—III Regression, Heredity, and Panmixia,” Pil- 
osophical Transactions, A, 187:253-318, 1896. 

® Walker, op. cit., p. 111 
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limitations.’ It has been shown that the 
variables eliminated in partial correlation 
should not be indirectly related, through cor 
relation with other variables, to the variables 
for which the net relationship is sought 
When this condition is not met, the coeffi- 
cients of partial correlation obtained are not 
precise measures of the net relationship be- 
tween two variables when one or more other 
variables are held constant or are eliminated 
Similarly, the coefficient of multiple correla- 
tion does not measure with precision the de- 
gree to which a given variable is affected by 
several related variables acting in combina- 
tion. Only under unusual circumstances « 

the ordinary correlation techniques accomplis! 
these theoretical purposes. 


More adequate techniques have been de- 
vised by Spearman, Kelley, Holzinger, and 
Thurstone. Spearman® observed that when 
correlation coefficients representing all of the 
intercorrelations between a number of vari- 
ables are arranged in a table in order of de- 
creasing average magnitude by rows, there us- 
ually occurs a decrease in magnitude of the 
coefficients from left to right. He concluded 
that this condition indicates the presence of a 
general factor underlying the variables. He 
later advocated the use of the tetrad equation 
as a means of demonstrating whether or not 
the correlations between variables are to be 
ascribed to a single common factor. The fol- 
lowing are examples of tetrad equations: 


T1011 34 — "13724 = O.- 
Viol 34 — 1 s%oz = oO. 
V1 3%24 — "14% o3 = O- 


When ordinary coefficients of correlation 
are substituted and the equations equal zero 
within the limits of the probable error, the 
explanation is that a single factor is common 
to the variables concerned. The following is 
a demonstration of how a single factor can 
cause a tetrad equation to vanish, or equal 
zero. Let us assume that the correlation be- 
tween variables X, and X, is entirely due to 
a common factor g. Then, if the effect of g 
is removed, the correlation should equal zero 


7 Burks, B. S. ‘On the Inadequacy of the Partial 
Multiple Correlation Technique,’ Journal of Educational Psy- 
chology. 17:532-40, 625-30, November, December, 1926. 

Engelhart, M. D. ‘The Technique of Path Coefficients, 
Psychometrika, 1:287-293, December, 1936. 

Monroe, W. S. and Engelhart, M. D. The Scientific Study 
of Educational Problems. New York: The Macmillan Com- 
pany, 1936, p. 366-400. 

*Spearman, C. ‘General Intelligence Objectively Deter 
mined and Measured,”’ American Journal of Psychology 
15:201-293, 1904. 
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[he ordinary partial correlation formula may 
e used: 


Where Ys2 
+ 


1 } 
1, dll «he 


. represents the correlation between 
with g eliminated. Then, from the 
ipove 

T12 =" yel 2g 


nd similarly 
i = TF se 
r; | eee =1, 27 38 
4 = Toe se 


substituting in the first tetrad equation, 
i Salad a 1 Soled ag t=O 
Hence, a single common factor is sufficient 
explain the vanishing of a tetrad.* Con- 
‘rsely, the discovery of the fact that numer- 
us tetrad equations vanish within the limits 
{ probable error has been offered in support 
theory that most intellectual traits are 
due to a common factor plus certain specific 
factors. The specific factor is unique to a 
given test and, hence, is not involved in a 
relation coefficient or in the tetrad equa- 
tion. Spearman also derived a formula for 
se in determining the extent to which a given 
variable correlates with the hypothetical com- 
mon factor. 
Use of the tetrad equations revealed that, 
with correlations between certain traits, the 
tetrads do not vanish. The explanation of- 
fered was to the effect that factors other than 
the single common factor are present, that is, 
factors not unique to each of the variables nor 
ommon to all of them, but of some degree of 
generality. Techniques were devised for use 
in isolating these group factors.’ 
Kelley? Holzinger,’* 


Hotelling,’* and 


Kelley has shown that certain combinations of several fac 
tors can theoretically cause a tetrad ‘equation to vanish, but 
the existence of such combinations seems improbable. 

Kelley, T. L. Crossroads in the Mind of Man. 
University, California: Stanford University Press, 1928. 

I n example of their use, see: C airns, George J An 
Analytical Study of Mathematical Abilities,’ Catholic Univer- 
ity of America, Educational Research Momographs, Vol. 6, 
~~ 3. Washington: Catholic Education Press, April, 1931. 

)4 p 

Kelley ~p. cit. Kelley, T. I 

Life. Harvard Studies in Education. Vol. 26. Cam- 
bridge, Massachusetts: Harvard University Press, 1935. 146 p. 

Holzinger. Karl J. Preliminary Report on Spearman- 

inger Unitary Trait Study. Chicago: Department of Edu- 
— University of Chicago, 1934, 1935, and 1936. (Re- 
ports 1-7) 

 Hotelling, Harold. “Analysis of a Complex of Statisti- 
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Thurstone** have attacked the problem of 
analyzing correlations in an effort to deter- 
mine the underlying factors. The research of 
Thurstone has led to procedures which seem 
most effective in solving the problems. By 
means of his “centroid” method, one can as- 
certain how many basic factors underly the 
intercorrelations of a number of traits. If 
only a single factor is present, his procedure 
gives the same results as the procedure of 
Spearman. If, however, severa! common fac- 
tors are present—i.e., factors of varying de- 
grees of generality—the procedure gives the 
correlations of the original variables with 
these factors. However, the solution thus ob- 
tained is not unique. Other solutions can be 
obtained; hence, further mathematical tech- 
niques are used in order to secure psychologi- 
cally meaningful results. 

The centroid method was used in this study 
with the intercorrelations calculated from the 
comprehensive examination data.’ The 
writer did not feel that his data justified the 
continuation of the analysis beyond the iden- 
tification of a single common factor. As will 
be indicated later, it is probable that minor 
factors common to two or more of the vari- 
ables are present, particularly in the situation 
where intelligence is included as one of the 
original traits. The possible existence of 
these factors does not limit the conclusion 
that achievement in any given survey course 
is largely due to ability or abilities which 
function in all of the surveys. 

In a preceding paragraph it was shown that 
the correlation between two traits is equal to 
the product of the correlations between each 
of the traits and a single common factor, i.e., 
15> ==Tye%og. Where there are several com- 
mon factors 1, 2, 3, 4 r, any correla- 
tion 7; may be expressed,’® 


V 3 SS By Aq, - Aj 2Axo 
rT ho i on ee (1) 


Where a;, refers to the correlation between 
trait j and the hypothetical common factor, 1, 
a, refers to the correlation between trait & 


™ Thurstone I I d Vultit 
Ann Arbor, Michigan: Edmards. ‘Brothers, Inc., 
65 

L. L. A Simplified Multiple Factor Method and 
an Outline of the Computations. Ann Arbor, Michigan: Ed- 
wards Brothers, Inc., 1933, 26 p 

Thurstone, L. L. The Vectors of Mind. Chicago: 
sity of Chicago Press, 1935. _ p. 

18 Spearman's formula for «relation of a test with a 
single common factor was ey as a check. 

*See Th in. The Vector Vind 
University of Chicago Press, 1935. D. 92. 

The derivation given here in explanation of the centroid 
method closely follows that given by Thurstone 


Factor 
June, 1932. 


Univer- 
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and the factor 1. Similarly, a;, and a, refer 
to the correlations with a second factor. The 
symbols a;, and ad, have the same meaning 
for the last, or r, factor common to the tests. 
These factors, while correlated with the traits 
j and &, are themselves uncorrelated. A simi- 
lar equation may be written in symbol form 
for each of the correlations in a table of all of 
the intercorrelations between the given tests. 
The subscript j stands for any row, and k 
stands for any column; since such tables are 
symmetrical, they refer successively to the 
same traits and in the same order. 


Geometrically, the numerical values of the 
a's depend on the locations of orthogonal ref- 
erence vectors, since each a can be represented 
as the projection of a vector which stands as 
a trait, or test variable, on the vector which 
represents the given common factor, which 
may be termed an “orthogonal” reference 
vector. The term “orthogonal” means that 
the reference, or factor, vectors are at right 
angles to each other—a configuration which 
denotes their lack of intercorrelation. There 
are as many of these reference vectors as 
there are common factors. If there are two 
such factors, the geometric representation is a 
plane; if there are three factors, the repre- 
sentation is in three dimensions; and if there 
are more than three factors, there are more 
than three dimensions, and the geometry is 
that of hyperspace. 


The centroid method yields for each trait, 
or test variable, a series of factor loadings. 
These loadings are the a’s, that is, the correla- 
tions of each test with the first factor, with 
the second factor, and so on. Since, however, 
the values of the a’s depend upon the locations 
in space of the reference vectors which repre- 
sent the common factors, and because the cen- 
troid method does not yield a unique solution 
in problems where more than one factor is in- 
volved, it is usually necessary to “rotate” the 
reference vectors obtained from a centroid 
solution. This is done to secure results which 
have psychological meaning. Rotation of the 
reference vectors or coordinate axes of the sys- 
tem, does not affect the values which may be 
calculated for r;,. In other words, the cen- 
troid factor loadings, or the loadings secured 
with reference to new coordinate axes, may be 
substituted in the equation given for r;, and 
the original table of correlations reproduced— 
an excellent check on the method. 
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Let ong 
_ "5,0" x, a is @’ 520 ko T pr 

, 
@’ 530’ ks sa ae jr@’ kr 2) ec 


where the primes refer to loadings obtained 
from the centroid solution. 

The vectors may be visualized as originat- 
ing in a point, the origin. The reference vec- Sul 
tor representing the first common factor 
passes through the origin also and through thy 
centroid or, in terms of physics, the center of 
gravity of the points which terminate the test 
vectors. When this is the case, the projec- 
tions on the reference vector are a maximum, 
and the reference vector has zero projections ne 
on the rest of the orthogonal reference vectors . 
representing other factors. The equation just CO! 
given may be summed for all tests j in col- 
umn & of the correlation table: 


7 


n 


Sr jy = Oy SO’ 5, t @' go D2’ ;- 
j i }=1 j=1 _ 
+... tay da’; (3) 
and summing for all columns of the correla- He 
tion table: 
Dn n T 
’ P 
= =! 5k au ,>d maT 
r 1 } 1 t i 4 
n n Dn n WI! 
Sa’ uo BO" yo +... + Seu de’ ;, (4) 
k > 3 1 K==j Jj 


Since the correlation table is symmetric, i.e., 
each row has its corresponding column 


I n 


26 ume = 20" om (5) - 


" 1 j=1 


n r 2 
xr Sa’,, J+ 
1 j=1 j ’ 
2 n 
=a’ 2 mee Ee tie x Sa’ ;, (6) 
J 1 J 1 


The co-ordinate of the centroid of the termini 
of the test vectors, measured along the ref- 
erence vector representing common factor 1, 
is equal to 


then 


| tAo 
4 
— 


j=1 
where » is the number of tests. 
One can assume that the system has been . + 
so rotated that the centroid lies in the first ; PR 
axis of reference, the vector representing the : 2 
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common factor. This centroid has zero 
oiections on the other orthogonal reference 
ectors, or axes. Hence 


, 
a’; 


d 1 


kK = 


which r, equals the summation of all the 


oefficients of correlation in the table includ- 
¢ the diagonal terms.?7 
Substituting (7) in (3), the summation of 
rrelation coefficients in a given column 
=P jx == 2’, 20" 5, 
1 1 


(g) 


<S/ 7 
2" 34 V ie 


4 


Hence, substituting in (9) 


(rr) 


vhere 7, represents the sum of the coefficients 


i given column. Hence 


(12) 


[hus one can determine for each test, rep- 
resented by a column in the correlation table, 
the correlation of that test with the 

mmon factor, since r, can be obtained by 
summing the entire table and 7, can be ob- 

ined by summing the given column. 


first 


I cl n I the Ooms I ] terms T e c 
rstone’s text. In this study the highest correlation in a 
mn was taken as the estimated communality, and inserted 
the diagonal cell. The true communality is equal to the 

of the squares of the factor loadings of the common fac- 
mn any test, i.e., factors that the test has in common with 
ther tests. Communality of test j= #*; = a), +07%,,+... 


{GO 


One can then remove from each of the orig- 
inal coefficients of correlation the part of the 
correlation which is due to the first common 
factor and thus obtain the residual correla- 
tion, i.e., that part of the correlation which 
is due to other common factors. 


From equa- 
tion (2) 


a’ 5,0", 3) 
where r,.;, represents a given residual corre- 
lation corresponding to the original correla- 
tion r;, and a’;, represents the correlation of 
trait j with common factor 1 and a’,, repre- 
sents the correlation of trait & with common 
factor 1. This has been done in this study. 
From the residual correlations, by a process 
identical with that expressed by equation 
(12), the correlation of each test with a sec- 
ond factor can be obtained. However, since 
the centroid previously mentioned has zero 
projections on the second, third, and other 
reference vectors, it is at the origin in the 
(yr —1) subspace** and must be removed 
from this origin. This is accomplished by a 
process described by Thurstone, and need not 
be explained here.*” The process of extract- 
ing factors is repeated until the residual corre- 
lations are negligible in size. 


The First Sample 

In Table I are given the coefficients of cor- 
relation calculated from the first sample of 
data. 

Through use of the process 
equation (12) the following 
each of the tests with the first 
were calculated: 

English 101-102 
Social Science 101 
Social] Science 201 
Humanities 201-202 . 
Biological Science 101-102 
Physical Science 101-102 

* A space from wh 
factor has been removed. 
r space 

See 7 


represented by 
correlations of 
common factor 


102 - 


202 


C . r n re} 
It was not at the origin 


TABLE I 


English 
101-102 
.62 
.63 
61 
66 
56 


Social Science 101-102 
Social Science 201-202 _- 
LY 


iumanities 201-202 


i logical Science 101-102 


*hysical Science 101-102 


Social 
Science 
101-102 


Social 
Science 
201-202 


Biological 
Science 
101-102 


Humanities 
201-202 


.76 
.66 
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The residual correlations were then calcu- 
lated by means of equation (13). These are 
given in Table II. 

None of the residuals is significant, a fact 
which indicates that the structure of each of 
the variables is that of a factor common to all 
of the variables, plus a factor specific to that 
variable. Calculation of the correlations of 
each of the tests with the common factor by 
means of the Spearman method,”° which gives 
identical results only when one common fac- 
tor is present, gave the following values: 


ES |, are .76 
Social Science 101-102 _____--_-_~- .! 84 
Social Science 201-202 _.___---- 85 
Humanities 201-202 ......_-..-- .80 
Biological Science 101-102 ___--_ .85 
Physical Science 101-102 __-_~_~- ms 


The Second Sample 
In Table III are given the intercorrelations 
of the six comprehensive examinations with 
” Spearman, ( The Ab 77 ‘ ‘ , rt 


! ’ J . Y ! -4 
Macmillan Company, 1927, appendix, p. xvi, equation 21 
The formula is 


V A? 4 
‘ag ——— 
VT —2A 
4 is the sum of the intercorrelations between test @ and every 


other test of the group, A’ is the sum of the squares of these 
correlations, and T is the total of all intercorrelations. 


each other and with intelligence as measured 
by the Psychological Examination of the 
American Council of Education. 

The correlations of each test with the com 
mon factor, calculated by means of equation 
(12), are as follows: 


Bneen 101-368 ............... 42 
Social Science 101-102 _________ .82 
Social Science 201-202 -________ .81 
Humanities 201-202 _.____-__-_- 20 
Biological Science 101-102 ______ .82 
Physical Science 101-102 __-_-___ .81 
ES ee eae 44 


Calculation by the other i.ethod referre 
gives the following values: 


English 101-102 ...-....----- 13 
Social Science 101-102 _________ .82 
Social Science 201-202 _____---- .78 
Humanities 201-202 _......_._-_- .78 
Biological Science 101-102 _____~ .82 
Physical Science 101-102 ____--- 81 
OS eae 


In Table IV are given the residual corre- 
lations calculated through use of the first set 
of values. 

There are somewhat greater discrepancies 
between the two series of values, and the 


TABLE II 
Social Social Biologica 
English Science Science Humanities Science 
101-102 101-102 201-202 201-202 101-102 
Social Science 101-102 ~ sed nab aaa —.03 
Social Science 201-202 __- : —.02 +-.04 
Humanities 201-202 _________- ‘ —.02 —.07 +.06 
Biological Science 101-102 ________~ +..01 —,.03 —.05 .00 
Physical Science 101-102 -.03 L 02 —.07 —.09 +..04 


TABLE III 


Social Social Biological Physica 
English Science Science Humanities Science Scienc 
101-102 101-102 201-202 201-202 101-102 101-102 
Social Science 101-102___ 58 
Social Science 201-—202__-_ 49 .72 
Humanities 201-202 _____ 54 .70 .80 
Biological Science 101-102 53 .66 61 .65 
Physical Science 101-102_ DT -65 .60 54 74 
Intelligence ............ AQ .23 19 17 34 40 
TABLE IV 
Social Social Biological Physica! 
English Science Science Humanities Science Science 
101-102 101-102 201-202 201-202 101-102 101-102 
Social Science 101-102___ .00 
Social Science 201-202... —.09 +.06 
Humanities 201-202 _____ —.04 +.05 +-.15 
Biological Science 101-102 —.06 .00 —.05 —.01 
Physical Science 101-102. —.01 —.01 —.06 —.11 +.08 
Intelligence ____________ 4+.14 mis am? —18 —.02 +.04 
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residuals are slightly larger. Hence, when 
intelligence is included as a variable, a single 
ommon factor is not sufficient to explain en- 
tirely the original correlations. Other, though 
minor, common factors exist. 
[he correlations between the tests and the 
common factor when intelligence is not in- 
luded as a variable are as follows: (The 
rst column refers to the centroid method of 
calculation, and the second column to Spear- 
man’s method.) 
ish 101-102 __-_- . 
Social Science 101-102_______-_- i Poe 
Social Science 201-202_____~- _ .84 .82 
jumanities 201-202 __...___---_. 82 
logical Science 101-102 _- = ae 81 


Physical Science 101-102 -....-- .80 ---~ .78 


\pparently, one factor is sufficient to ex- 
lain the intercorrelations between the scores 

the comprehensive examinations. 

nclustons 

The common factor may represent a com- 
plex of abilities. It has something in com- 
mon with what is measured by an intelligence 
test, but it is not restricted to intelligence as 
measured. This was shown by the fact that 
the typical comprehensive examination corre- 
ites .80 with the common factor, while in- 
telligence correlates .44. It is possible that 
the common factor includes such traits as 
perseverance” or other attributes of charac- 
ter which may compensate for a lack of in- 


telligence. One can interpret the correlation 
of a comprehensive examination with the com- 
mon factor in terms of individual differences. 
It may be said that approximately 65 per cent 
of the variance (or less precisely, variation in 
achievement) is due to the common factor.*! 
Assuming a coefficient of reliability of .9o for 
the typical comprehensive examination, 25 per 
cent of the variance, or variation in achieve- 
ment, may be ascribed to minor general fac- 
tors, or to abilities specific to the given survey. 

The results of this study support the con- 
tention that the survey courses qualify as sur- 
vey courses in that the abilities required are 
largely common. While special talent may be 
required for superlative achievement in a 
given survey course, average, or even superior, 
achievement is possible for the student of 
average or superior status with respect to the 
general ability. Conversely, satisfactory 
achievement in a given survey course is pos- 
sible for any student whose achievement in 
the other surveys is satisfactory. If a stu- 
dent’s achievement in one survey course is 
unsatisfactory while his achievement in the 
others is satisfactory, the failure is probably 
the result of lack of interest because of in- 
adequate motivation rather than the result of 
lack of some special ability. 

® For a discu n of the technique kil 
interpretation of coefficients of correlation, see Engelhart 


M. D. “The Technique of Path Coefficients,” Psychometrika, 
1:287-293, December, 1936 




















SOCIAL COMPETENCE OF GRADE SCHOOL CHILDREN 


KATHERINE P. Brapway, M. A.* 
Research Assistant, The Training School at Vineland, New Jersey 


Tests of verbal intelligence and scholastic 
achievement have been used as criteria for the 
classification of grade school children when 
the materials of instruction have been essen- 
tially academic. However, since modern edu- 
cation emphasizes the preparation of the child 
for daily living, and organizes subject matter 
in terms of interests, activities, and social 
needs, it is becoming desirable to include in 
the classification criteria measures of social 
attainment. The purpose of the _ present 
paper is to call attention to a recently devel- 
oped scale for measuring social maturity, and 
to report the results of administering the scale 
to a group of three hundred children attend- 
ing grade school. 

The Vineland Social Maturity Scale meas- 
ures social development in terms of personal 
independence and responsibility. In infancy 
and early childhood social maturity is re- 
flected in self-help, at adolescence in self- 
direction, and in adult life as assumption of 
responsibility for others. The successive 
items of this social scale represent progressive 
maturation in self-help, self-direction, social 
relations, locomotion, occupation, and com- 
munication. The items are divided into age 
groups representing increasing degrees of so- 
cial competence. These genetic levels of per- 
formance are considered as successive stages 
of social maturation. 


VINELAND SoOcIAL MATURITY SCALE* 


Items Years 
0-I 
“Crows”; laughs 
. Balances head 
Grasps objects within reach 
Reaches for familiar persons 
Rolls over 
. Reaches for nearby objects 
. Occupies self unattended 
. Sits unsupported 
9. Pulls self upright 
10. “Talks”; imitates sounds 
11. Drinks from cup or glass assisted 
12. Moves about on floor 
13. Grasps with thumb and finger 
14. Demands personal attention 
15. Stands alone 
16. Does not drool 
17. Follows simple instructions 


Ne 


iw) 


OI OT 


* The author acknowledges the assistance of Dr. Edgar A. 


Doll in the treatment of data and preparation of manuscr pt 
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Items Years 


bo DO DO bY bY DO bo 
ASO COO 


45. 
46. 
47. 
48. 
49. 
50. 


= 


62. 
63. 
64. 
65. 


51. 
52. 
53. 
. Dresses self except tying 
55. 


a 
ob. 


I-Il 


. Walks about room unattended 
. Marks with pencil or crayon 


Masticates food 

Pulls off socks 

Transfers objects 

Overcomes simple obstacles 

Fetches or carries familiar objects 
Drinks from cup or glass unassisted 


. Gives up baby carriage 


Plays with other children 
Eats with spoon 


. Goes about house or yard 
. Discriminates edible substances 


Uses names of familiar objects 


. Walks upstairs unassisted 
. Unwraps candy 
. Talks in short sentences 


II-III 


. Asks to go to toilet 
. Initiates own play activities 


Removes coat or dress 


. Eats with fork 
9. Gets drink unassisted 
. Dries own hands 
. Avoids simple hazards 
. Puts on coat or dress unassisted 
. Cuts with scissors 
. Relates experiences 


III-IV 
Walks downstairs one step per tread 


Plays cooperatively at kindergarten ley 


Buttons coat or dress 

Helps at little household tasks 
“Performs” for others 
Washes hands unaided 


IV-V 


Cares for self at toilet 
Washes face unassisted 
Goes about neighborhood unattended 


Uses pencil or crayon for drawing 
Plays competitive exercise games 


V-VI 


7. Uses skates, sled, wagon 
. Prints simple words 

. Plays simple table sames 
. Is trusted with money 

. Goes to school unattended 


VI-VII 


Uses table knife for spreading 
Uses pencil for writing 
Bathes self assisted 

Goes to bed unassisted 


tec Dt fe fee fe bet feed eed feed ed ed 
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Years 
VII-VIII 
Tells time to quarter hour 
Uses table knife for cutting 
Disavows literal Santa Claus 
Participates in pre-adolescent play 
Combs or brushes hair 
VIII-IX 
Uses tools or utensils 
routine household tasks 
Reads on own initiative 
Bathes self unaided 
IX-X 
Cares for self at table 
Makes minor purchases 
Goes about home town freely 
X-XI 
Writes occasional short letters 
Makes telephone calls — 
Does small remunerative work 
Answers ads; purchases by mail 
XI-XII 
Does simple creative work 
Is left to care for self or others 
Enjeys books, newspapers, magazines 
XII-XV 
Plays difficult games 
Exercises complete care of dress 
Buys own clothing accessories 
Engages in adolescent group activities 
Performs responsible routine chores 
XV-XVIII 
Communicates by letter 
Follows current events 


Does 


2. Goes to nearby places alone 
3. Goes out unsupervised daytime 


Has own spending money 

Buys all own clothing 
XVIII-XX 

Goes to distant points alone 

Looks after own health 

. Has a job or continues schooling 

. Goes out nights unrestricted 

. Controls own major expenditures 

. Assumes personal responsibility 
XX-XXV 

Uses money providently 

. Assumes responsibilities beyond own needs 

. Contributes to social welfare 

. Provides for future 


XXV+ 

. Performs skilled work 

. Engages in beneficial recreation 
. Systematizes own work 

. Inspires confidence 

. Promotes civic progress 

. Supervises occupational pursuits 


2. Purchases for others 
3. Directs or manages affairs of others 


. Performs expert or professional work 
. Shares community responsibility 
. Creates own opportunities 


7. Advances general welfare 


Although patterned in principle after the 
Binet Scale for measuring intelligence, this 
social scale does not require direct examina- 
tion of the subject. The method provides in- 
stead for a record of habitual performances 
obtained by interviewing someone intimately 
familiar with the person examined. The final 
score is reckoned from the total number of 
items successfully performed, and this score 
may be converted to a social age score by re- 
ferring to the scale directly without reference 
to a table of norms. A social quotient is ob- 
tained by dividing the social age by the life 
age up to life age 25. For adults older than 
25 years, the divisor remains 25, just as the 
divisor for Binet intelligence quotients re- 


mains 14 (or 16) after the age of 14 (or 16). 


The validation and normal standardization 
of the scale are reported in detail elsewhere 
(7). Its reliability on re-examination is high 
(r .93), and its validity as determined by 
correlations between estimates and obtained 
scores is also high (r= .85). The scale is 
relatively free from sex differences, and per- 
formance on the scale is not correlated with 
social status, except for the relation of the 
latter to intelligence. 


Application of the Scale 
Dr. Roy F. 


Hygiene, 


Street, Director of Mental 
Anne J. Kellogg School, Battle 
Creek, Michigan, generously put at our dis- 
posal some early results he obtained by ad- 
ministering the Vineland Social Maturity 
Scale to children at the Anne J. Kellogg 
School. This school provides for classifying 
children within grades according to the indi- 
vidual abilities of the children. In addition 
to the regular classes there are several special 
classes. ‘The data which Dr. Street referred 
to us included life ages, social maturity 
scores*, Binet mental ages, and percentile 
rankings on the Pintner Pupil Portrait test, 
for 310 children distributed in the following 
classes: regular classes of grades 4, 5, and 
6; slow to retarded classes of grades 4, 5, and 
6: retarded classes of grades 7 and 8; gifted 
classes of grades 5, 6, 7, and 8; remedial 
classes of grades 4, 5, and 6; and open-air 
classes of grades 4, 5, and 6. The non-con- 
formity of the groups for which we have re- 


* The first published form of the Social Maturity Scale 
(Form A) was used in obtaining the social maturity scores 
However, the 1936 norms were wu in converting social scores 
to social ages. It should be noted here, too, that the item 
differences between Form A and the revised form, (Form B 


re minor at the level at which grade school children score 
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sults limits the study somewhat, but the data 
are adequate to show certain trends. 

Study of the material revealed that its 
treatment would throw light on three prob- 
lems which were important in relation to the 
possibility of using the scale as a criterion in 
educational classification. These problems 
were: (1) differences in social maturity of 
children in retarded, regular, and gifted 
classes; (2) relation between social age and 
mental age; (3) relation between social com- 
petence and personality as measured by the 
Pintner Pupil Portrait test. 


Social Maturity in Relation to Type of Class 


Table 1 shows the means and extreme devi- 
ations of life age, social age, social quotient, 
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The column labelled “SA” in Table 
cates that there is a marked relation betwe. 
social age and the type of class in which 
child is placed. The mean SA increases from 
the retarded to the gifted class for each grack 
although the corresponding LA’s decrease. |; 
will be noted that the differences between th, 
mean SA’s of the retarded and gifted classe 
of any one grade are all greater than 3 years 
whereas the difference between the mean SA 
of the retarded class of grade 4 and the re 
tarded class of grade 8 is only 1.5 years 
The gifted show a more significant increass 
from grade to grade than do the retarded 
The gifted increase from a mean SA of 12.: 
at grade 5 to a mean SA of 15.6 at grade 8 


TABLE 1 


MEANS AND EXTREME DEVIATIONS FOR RETARDED, REGULAR, AND GIFTED CLASSES 


MEANS 
N;|LA SA SQ MA _ IQ 
Grade { 
Slow to Ret. 13 | 10.2 9.0 88 8.3 82 
Regular 41 95 108 108 9.9 104 


Grade 5 


Slow to Ret. 9 | 11.1 9.0 81 8.4 76 
Regular . 42/10.5 11.1 106 10.8 103 
Gifted . 181102 122 120 18.2 129 
Grade 6 

Slow to Ret. 15 | 12.2 9.2 75 8.7 71 
Regular _ 43 /11.6 11.6 100 11.7 101 
Gifted .___ 13 11 12.5 118 15.3 138 
Grade 7 

Retarded _. 9 13.2 9.7 73 9.7 73 
Gifted .._.. 17|12.6 14.0 12 16.6 132 
Grade 8 

Retarded __ 12 |14.7 10.5 71 10.1 73 
Gifted __.. 19|138.2 156 118 168 128 


mental age, and intelligence quotient for the 
retarded, reguiar, and gifted classes. Table 2 
shows corresponding means for the remedial 
and open-air classes. 
TABLE 2 
MEANS FOR REMEDIAL AND OPEN-AIR CLASSES 
N LA SA SQ MA _ IQ 


Grade 4 
Remedia! _. 11 9. 9.4 100 9.7 104 
Open-Air -. 7 9.7 92 9 89 92 


Grade 5 
Remedial __ 13 99 10.6 107 10.2 103 
Open-Air _. 13 10.9 11.4 105 10.1 93 


Grade 6 


Remedial _. 18 11.9 11.6 98 11.2 94 
Open-Air _- 8 11.8 12.2 104 10.7 91 


10.5-11. 


EXTREME DEVIATIONS 
LA SA SQ MA IQ 


9.8-10.8 7.6-10.8 73-100 68-94  67-‘: 
8.8-10.7 7.8-12.0 85-130 8.3-12.4 89-1 


=I 


6.8-11.0 60-104 6.7-10.4 57- 99 
9.7-11.6 8.5-14.0 80-135 9.3-13.0 8&8~1l15 
9.4-10.9 11.0-14.0 106-149 12.3-13.8 126-1 


11.1-12.9 5.2-13.0 46-103 7.3-10.9 52- 87 
10.9-12.8 10.0-15.0 84132 9.5-14.0 2&6 
9.8-11.6 11.0-14.0 104-127 13.3-18.7 124-158 


2.4-13.8 5.8-12.0 42-87 7.4-13.0 60 
2.1-13.2 11.7-16.0 91-130 15.4-18.0 105-14( 


10.3-16.3 8.0-12.0 53-102 8.5-11.7 56-112 
11.9-13.6 14.0-17 


.0 102-129 15.6-18.0 118-131 


In the regular classes there is a 1.3 year in- 
crease from grade 4 to grade 6. 

The next column, labelled “SQ,” indicat: 
the relation between social quotient, or rela- 
tive social maturity, and type of class. Th« 
mean SQ for the retarded classes drops fron 
88 to 71, for the regular classes from 108 t: 
100, and for the gifted classes it varies be- 
tween 113 and 120. The range of SQ’s pre- 
sented in Column g of Table 1, shows that 
there is no overlapping of the SQ’s for the 
retarded classes with those for the gifted 
classes of the same grade. 

On the basis of these data we may conclude 
that social maturity varies appreciably with 
type of class, and to a lesser degree with 





SOCIAL COMPETENCI 


chool grade. This, of course, merely reflects 
e influences operative in producing the 
classification of the pupils in the existing 
system. 
' Similar analysis may be made from Table 2. 
jere there are less consistent differences, pos- 
‘ due to the smaller numbers of subjects 
in the several groupings. The differences be- 
tween the mean SQ’s of the two types of 
ssses are not large. 


tion Between Social Age and Mental Age 
In considering the use of the Vineland So- 
Maturity Scale in conjunction with intel- 
gence tests for classification of grade school 
jildren, it is important to know the relation 
between social maturity and intelligence. 
lhe close correspondence between the mean 
mental age and the mean social age for each 
of class in successive grades may be 
din Table 1. For the regular classes the 
mean MA minus the mean SA varies from 
1 to +.1. For the retarded classes the 
ean MA is below the mean SA, and for the 
cifted classes the mean MA is above the mean 
S\. The inferiority of mean MA to mean 
S\ for retarded classes has been reported in 
vious studies (1 and 9). The direction of 
ese differences is accounted for by the fact 
t mental age is one of the major criteria 
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for the segregation of children in these classes 
for the retarded and the gifted. 


The relation between social age and mental 
age for the complete group of 310 subjects 
(including the remedial and open-air classes) 
is shown in Table 3. The mean MA and 
mean SA for the total group are identical, i.e., 
11.3. The standard deviation of MA is 20 
per cent higher than that of SA. The corre- 
lation coefficient for these data was r 74. 
which corrected for LA gave the partial corre- 
lation r==.72. To avoid the possibility that 
the inclusion of grades 7 and 8, in which only 
the extreme classes were represented, was arti- 
ficially raising the correlation, a second corre- 
lation was computed in which the children 
from these grades were eliminated from the 
data. The correlation was r==.63, which 
corrected by partial correlation for LA re- 
mained r = .63. 


The correlation between SA and MA for 
the population of an institution for the feeble- 
minded (7) has been found to be r 85. 
Since the greater heterogeniety (wider spread ) 
of the institutional group might account for 
the higher correlation, these correlations must 
be compared in terms of their standard devi- 
ations. The standard error of estimate af- 
fords the basis for such a comparison. For 


TABLE 3 


RELATION BETWEEN SOCIAL AGE AND MENTAL AGE 


(Corrected for 


Mean MA 


Mean SA 
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both the total grade school group and the 
grade school group with grades 7 and 8 
omitted this value (o\/1—vr*)* was 1.4. 
For the feeble-minded group the relation was 
1.5. These are comparable and suggest that 
social maturity and Binet intelligence are re- 
lated to the same degree for the feeble-minded 
as they are for the intellectually “normal.” 


Relation Between Personality and Social 
Maturity 

Another problem which is important in con- 
sidering the Social Maturity Scale as a classi- 
fication instrument is the relation between 
personality or adjustment and social maturity. 
The Pintner Pupil Portrait test (11) had been 
administered to 278 of the 310 subjects exam- 
ined with the Social Maturity Scale. The 
Pintner test is a self-administering test for 
school pupils in grades 4 to 8. The test is 
composed of 100 statements in the form of 
impersonal descriptions of another child, and 
the pupil is asked to indicate whether he acts 
or feels the same or differently about that par- 
ticular situation or person. The statements 
consist of such statements as: “This child 
thinks school helps children.” “This child 
is often blamed for things he does.” The 
scale is scored according to the number of 
‘“correct’’ responses, 100 being perfect. The 
“correctness” for good adjustment is based on 
a validation study made by Pintner and others 
(10) in validating the test. The higher the 
score, the better adjusted presumably is the 
child. 

The correlation between SQ and score on 
the Pintner test for the total group of 278 was 
found to be r==.36. When this correlation 
was corrected for life age, the partial correla- 
tion became r = .41. 

The results indicate a low positive relation 
between Vineland Social Maturity scores and 
Pintner Pupil Portrait test scores, that is, a 
slight relation between relative degree of so- 
cial maturity and adjustment. 


Conclusions 


A study of the results of the application of 
the Vineland Social Maturity Scale to a group 
of grade school children in the 4th to the 8th 
grades has been reported to throw light on the 
possibility of using this scale in educational 
classification programs. The results show 
that there are significant differences in social 


* The average of the o’s for the two distributions was used 
as the o in this formula. 
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maturity of retarded, regular, and gifted 
classes as grouped by existing procedures, that 
there is a close relation between intelligenc: 
and social maturity, and that there is a low 
positive relation between adjustment and so- 
cial maturity. 

The high relation between social maturity 
and intelligence might seem to indicate that 
both measures are not needed; that one meas- 
ure is a sufficient indicator of both. Hovw- 
ever, the spread of SA’s for any one MA, as 
shown in Table 3, reveals that there would be 
considerable error of prediction in individual 
cases,—as much as 3 or 4 years. It would, 


of course, be unwise to use social maturityy 


alone as a criterion for class placement, since 
degree of intelligence seems to be the more 
important factor in learning. However, i 
might be practicable to use the two together 
The inclusion of gifted and retarded classes 
within each grade of a school system would 
seem to provide a means of adjusting the 
placement according to both intellectual and 
social maturity. 

The value of the scale in the study of indi- 
vidual problems might also be emphasized 
Poor adjustment of some children may be re- 
lated to discrepancies between social maturity 
and mental maturity. A child who is intel- 


' 


lectually superior to children of his own age, ; 


but is not correspondingly superior in social 
competence, and so is unable to compete so- 
cially with children older than himself, may 
find no group to which he “belongs.” The 
clear recognition of such a discrepancy be- 
tween mental and social maturity may facili- 
tate a satisfactory solution of the childs 
problem. 
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A STUDY OF OSCILLATION AS A UNITARY TRAIT 


MarIAN E. MADIGAN 


Milwaukee Vocational School 
Milwaukee, Wisconsin 


INTRODUCTION 


No one can “attend” continuously to a task. 
With the best of attention output fluctuates. 
Flugel showed that people who oscillate 
widely in one thing tend to oscillate widely in 
all things, and vice versa. It was on the 
basis of this investigation that Spearman, ap- 
plying the criterion of tetrad differences laid 
claim to a general factor of oscillation, sym- 
bolized by him as “o””?. 


THE PROBLEM 


This paper represents an attempt to investi- 
gate rather thoroughly the behavior-unit of 
oscillation. More explicitly stated, its pur- 
pose is to apply the scheme of intercorrela- 
tions, tetrads, and factorial analysis, in con- 
nection with other variables, to ascertain the 
existence of oscillation as a unitary trait. 
Closely akin to the main problem will be the 
consideration of the validity and reliability 
of the oscillation tests, the effect of two meth- 
ods of scoring them, and a comparison of os- 
cillation of children with that of adults. 


Data 

Some thirty variables centering around the 
assumed factors of oscillation, perseveration, 
spatial relations, mental speed, motor speed, 
attention, fluency, and memory were selected. 
This battery was given to 117 adults, the 
majority being graduate students. 

The tests utilized in this study were some 
of those previously administered to school 
children in the Spearman—Holzinger unitary 
traits study*. In the oscillation series, Test 
42 was made up of clusters of dots, Test 43 
utilized capital letters, Test 44 made use of 
segments of “X” and a “([]”, Test 45 was 
comprised of a series of digits, and Test 46 
consisted of the segments of “X”, a “([]”, and 
“W". The first four tests have their con- 
tent distributed along zigzag lines which twist 
irregularly back and forth across a page, 


*C. Spearman, The Abilities of Man. New York: Macmil- 
lan Co., 1927, Pp. 324-326. 

* Preliminary Report on Spearman—Holzinger Unitary Trait 
Study, No. 1. Prepared at the Statistical Laboratory, Depart- 
ment of Education, University of Chicago, 1934. 
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18 < 14. The symbols in Test 46 were dis- 
tributed in straight lines across the page. 

In Test 42, the subject was to count the 
number of dots; in Test 43 he was to dis- 
tinguish between capital letters made entirely 
of straight lines and those involving curved 
lines; the task in Tests 44 and 46 was t 
identify the symbols to which the segments 
belonged; and in Test 45 the subject was to 
add the digits by twos, putting down only the 
units figure. 


Experimental set-up 

Preliminary investigations in administering 
the oscillation tests as group tests had resulted 
in the waste of a large amount of data. In 
order to minimize this loss and secure reliable 
data, the oscillation tests were administered 
individually. The remainder of the battery 
was administered as group tests. 

Since measures of “o” were to be derived 
from the variation in output of precise units 
of mental work, an accurate automatic tim- 
ing device was arranged which clicked off in- 
tervals of five seconds. The subject was ad- 
monished to work as rapidly as possible in 
order that speed could be kept constant. He 
was carefully instructed to put a tick-mark on 
his paper at the place where he was working 
when the hammer clicked and then continu 
working at top speed. In the effort to main- 
tain top speed some of these interval-marks 
were ormitted. It was the business of the 
examitr to note carefully their omissions and 
insert hem after each test. Good records of 
responding were thus secured in every case. 
The administering of the “o” tests individ- 
ually, not only made for better responding, 
but it allowed the subject to complete each 
test. A short form, approximating about half 
the content of the long form, preceded each 
“o” test. 


VALIDITY AND RELIABILITY 
Validation 


Internal consisteucy among supposedly re- 
lated variables was the basis on which valida- 
tion was studied. If four or more measures 
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rrelate positively, then there is justification 
for postulating a trait which is indicated by 
each of the measures*. Intercorrelations for 
both the short and the long forms were 
considered. 

The average of the ten intercorrelations for 
the short forms was .2072 and the average for 
the long forms of the same set was .4190. 
However length was not the only factor mak- 
ing for the increase in the intercorrelations. 
[he average intercorrelations showed that a 
test of 60 five-second intervals yielded as good 
results as one 180 five-second intervals in 
length. Inasmuch as the number of different 
tasks in each test was relatively few, thereby 
making for a great deal of repetition through- 
out the test, the set of “o” tests was consid- 
ered relatively constant in difficulty. On the 
assumption that these tests measure oscilla- 
tion, and the intercorrelations lend support to 
this, it would seem that the better intercor- 
relations are due to the content of the tests. 
Ranking first, is Test 43. 

Scoring was based on the mean deviation 
formula and the method of differences. In 
this latter method, if a subject’s output were 
constant, there would be no differences, but as 
his output fluctuates the absolute change from 
interval to interval is summed and divided by 
the number of intervals for his score. Flugel 
makes the statement that he adopted this 
method instead of the mean deviation 

oth because of its greater simplicity from 
the point of view of calculation and be- 
cause the measure of variability obtained 
by such a method is relatively unaffected 
a constant tendency of the value to rise 
or fall—such a constant tendency being of 
course present in our data in the shape of 
fatigue.” 
To the writer this suggested that intercorrela- 
tions secured by this method would be higher 
than by the deviation formula. The compari- 
son was made on the basis of the longer forms 
because fatigue would be expected to be pres- 
ent to a greater extent than on short forms. 
The results in Table I do not support the in- 
vestigator’s interpretation. 

“Holding amount constant” means that the 
subject finished the test regardless of time. 
“Time constant” basis is the time that it took 
the fastest subject to complete the test. In 


’L. L. Thurstone, The Reliability and Validity of Tests, 
p. 101. Ann Arbor, Michigan: Edwards Brothers Inc., 1932. 
_.*J. C. Flugel, Practice, Fatigue, and Oscillation. The Brit- 
ish Journal of Psychology Monograph Supplement, No. 13, 
p. 66. Cambridge University Press, 1928. 
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this situation the results are calculated on the 
same number of intervals varying per test 

A comparison of the results in Table I show 
the averages for the “Time Constant” basis to 
be lower. That is, the length of the test has 
been reduced and in all but three instances, 
the correlations are lower. 

From the foregoing results, it is evident 
that the validity of oscillation tests on the 
basis of internal consistency depends on 
length. That is, other things being equal, 
the longer the tests the better the inter-corre- 
lations resulting therefrom. But investiga- 
tions reveal that beyond 60 five-second time 
intervals, “‘o”’ tests do not yield increasingly 
better results commensurate with the time and 
labor involved. Of the two methods of scor- 
ing, the results favor the mean deviation 
method. 

Reliability 

Since the ‘“‘o’’ tests are the continuous func- 
tion type of test, alternate items or intervals 
could not be used. Therefore it was neces- 
sary to use the first half of the test and com- 
pare it with the second half. Since the con- 
tent of the tests was of the same level of dif- 
ficulty throughout, it does not seem that it 
would be objectionable to use continuous 
halves for a basis of reliability. 

The primary purpose in administering the 
short forms of each ‘“‘o” test was to thoroughly 
acquaint the subject with the test before tak- 
ing the long form. It so happened that the 
content of the short form was identically re- 
peated in the beginning of the long form. By 
cutting the long tests off to parallel the con- 
tent of the short tests, two identical forms of 
the same test were available. Since the long 
form doubled the content of the short form in 
three tests, this allowed an empirical check on 
the use of the Spearman—Brown formula. 
The other tests were about one and a half 
times their short forms. 

One of the most illuminating bits of evi- 
dence from a study of these results in Table 
II is the fact that the average of the correla- 
tions by identical forms is practically the 
same as the average resulting from the relia- 
bility of the long forms. It is the opinion of 
the writer that such a comparison as this 
yields conclusive evidence bearing on practice 
effects. There was no intention in the mind 
of the examiner to hold practice effects at a 
minimum. In this case they might well be 
considered at a maximum since the short 
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TABLE I 
CORRELATIONS OF “0” TESTS BASED ON TWO DIFFERENT METHODS OF SCORING 
Amount Held Constant Basis 


Mean Deviation Method 


Test 43 44 45 
Be) Wichita aiderats .3051 .3062 .38908 
| REST iagre .5100 .4557 
ee ee .4003 
— SEE 
er .4190 
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Flugel’s Method 





46 43 44 45 46 
.3375 3171 -2365 .4339 3280 
-5086 -4630 -3962 4385 
4928 4199 4200 
-4826 4083 

3861 


Time Held Constant Basis 


Mean Deviation Method 


Test 43 44 45 
| (aa .2747 .1795 .3477 
7a .4464 4345 
| .38153 
Fes 

| .3663 


forms approximated half the content of the 
long forms. Such evidence as this should 
allow the experimenter to provide ample prac- 
tice periods for subjects with no fear of prac- 
tice effects. 


Corresponding halves of these ‘“o” tests 
were correlated and stepped-up by the Spear- 
man-—Brown formula to note their compari- 
sons with the intercorrelations resulting from 
the complete tests. On the “Amount Held 
Constant” basis the average of the intercor- 
relations on the first half of each test was 
.4554. For the second half it was .4914. The 
average of the intercorrelations on the com- 
plete tests was .4188. On the “Time Held 
Constant” basis, these averages were respec- 
tively .4332, .3892, and .3605. Thus, the 
Spearman—Brown formula applied to corre- 
sponding halves is sensitive enough to yield 


Flugel’s Method t 

46 43 44 45 46 . 
.3059 .2948 .1961 .3973 .2691 ¥ 
4981 .4253 .38069 .4209 Q 
:3670 2871 3078 : 
.4939 .0D45 ' 
.3260 | 


results differing by only .03 to .o7 from those 
obtained from the complete tests. 

The results of this section indicate that the ; 
“o” tests are stable in yielding reliability co- , 
efficients of the same magnitude on the con- 
tinuous half method as well as on identical 
forms. Practice effects are negligible and 
length plays relatively the same role in reli- ‘ 
ability as in validity. The Spearman—Brown 
formula applied to intercorrelations of cor- 
responding halves yields results similar to 
those obtained from the complete tests. 


THE EFFECT OF INCREASING THE TIME UNIT 
ON THE MEASUREMENT OF OSCILLATION 


The results of a preliminary investigation of 
25 cases on varying the time unit, pointed to 
the operation of a law in measuring oscilla- 
tion. The possibilities inherent in such an 


TABLE II 


RELIABILITY OF 117 CASES oF “0” TESTS INVOLVING TWO DIFFERENT METHODS OF SCORING: 
ALL RESULTS STEPPED UP BY THE SPEARMAN-BROWN FORMULA 


Mean Deviation Method 


Flugel’s Method 





Amt. Time Amt. Time 
Identical Held Held Held Held 
Test Forms Constant Constant Constant Constant 

ae een ae ee .6178 .4961 5857 .5215 .4937 

 itaidnaceasaascnisatas bible aes eetiladh Macca .6943 .7615 .6609 .6937 .6005 

ee eee .5941 .6727 .5209 .5967 .4322 

ES ae he pean .6610 .6695 .6897 .6921 .6029 
_ eee eee .6834 .6563 .6563 4943 49438 
oT eee Oe 6501 6512 6227 5997 5247 . 





46 
OZR) 
4385 
420 
.408 


46 

9601 
.4209 
.3078 


0049 
Tet 


those 


at the 
ty co- 
> con- 
ntical 
» and 
1 reli- 
srown 
[ cor- 
ar to 


UNIT 
[ION 

ion of 
‘ed to 
scilla- 
*h an 


NG: 














rch, 1938] A STUDY OF OSCILLATION AS A UNITARY TRAIT 226 
4 
3 14.3 






Sum of intercorrelations 












Z = 
a 
—_ i Deviation 
_—_ NT> Adults 
le 
1 = ™ Flogel's Method 
Mean Deviation 
\ 
\ 
\ 25> Children 
Flugels Methed 
o 





—_ so 4§ Zo PE y 


Seorimng soterval expressed in 6etonds 


Figure 1.—The graphs of oscillation for children and adults based on 
two methods of scoring on varying time units. 


operative factor warranted an investigation on 
the larger sample of 117 cases. 


Basis of Procedure 

The oscillation tests were scored on the 
basis of 5-, 10-, 15-, 20-, and 25-second inter- 
vals. Two methods of scoring were investi- 
gated. 

In order to provide the same length for all 


subjects on a test, the “Time Constant” basis 
was used. Three tests were approximately 
45 five-second intervals in length while the 
other two were 70 and 114 five-second inter- 
vals. The last step involved the intercorre- 
lations of these scores on the five different 
time units according to the two methods of 
scoring. 
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Interpretations 

The preliminary investigation dealt with 
children, the 117 cases with adults. In Fig- 
ure 1, the graphs have been displayed to- 
gether. It is obvious that scoring by either 
method on the smallest available time unit 
yields the best results. One might raise the 
question as to how small this unit of time 
ought to be in order to produce the maximum 
measure of “o’’. Beyond a 15-second inter- 
val, the “o’’ scores tend rapidly toward zero. 

The parallelism of the two methods comes 
between the ro-second and the 15-second in- 
tervals as well as between the 20- and 25- 
second intervals for children while the plateau 
in the mean deviation method occurs between 
the 15- and 20-second intervals. But in the 
graphs of the adults, the parallelism occurs 
between the 5- and ro-second intervals and 
the plateaus come between the ro- and 15- 
second intervals. The almost direct down- 
ward descent in Flugel’s method for children 
starts at the 15-second interval while a some- 
what smoother descent starts at the 1o-second 
interval for adults. Likewise the sharp 
descent of the mean deviation method for chil- 
dren operates at the 20- to 25-second interval 
while a smoother descent operates at the 15- 
to 20-second interval for adults. 

It is such evidence as this that points to 
the idea that perhaps maturation may be in- 
volved in this factor of oscillation. The more 
natural tendency has been conditioned by ex- 
perience until it takes a finer unit of measure 
to ascertain its presence. 


FAcTOR ANALYSIS OF THE DATA 


The factor analysis of these data proceeds 

according to the Two-Factor method of Spear- 
man. Aside from using the tetrad criterion, 
residuals were resorted to most freely. Resid- 
uals should be insignificant if the tetrads 
vanish for the set of variables. The agree- 
ment of possible factors with that of the 
large residuals has been shown to be suffi- 
ciently accurate for the allocation of suspected 
extra factors.° Whenever critical points 
arose, tetrads were used as a check. Thus 
the tedious task of computing thousands of 
tetrads was eliminated, the method of resid- 
uals and, if necessary, the formulation of 
more than one pattern proving the more 
expedient. 


5 Spearman- Holzinger, Preliminary Report on the Unitary 
Trait Study, No. 2, p. 23. Prepared at the Statistical Labora- 
tory, Department of Education, University of Chicago, 1934. 
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Indications of Factors 
The intercorrelations of the 29 variables ; 


general show a high degree of consiste ney 
amongst the clusters of tests for each assumed 
factor and, with the exception of menta| 
speed, the correlations amongst the tests rep- 
resenting different assumed factors are low 
Tetrads for each of these other sets of four or 
more variables are generally significant. |; 
it were not for this overlap, the large value of 
rp) for mental speed would indicate a satisfac- 
tory measure of such a composite. With the 
exception of the perseveration tests, the pool- 
ings in Table III indicate that measures of the 
various assumed factors are fairly satisfactory 
Such evidence as this allows one to procee 
with the factorization of the data. 


TABLE III 


TETRADS AND CORRELATIONS rp; FOR EACH 
ASSUMED FACTOR 


Assumed P. E. largest tetrad 
Factor in set 
Spatial .... .... P. E. (—.0151) .0271 .9388 
Mental Speed ___ P.E.( .2356) .0188 .9556 
Motor Speed __-._ P. E.( .0944) .0537  .8500 
Perseveration ___ P. E. (—.0462) .0116 .7024 
Oscillation _.... P.E.( .0772) .0829 .8946 
Attention  _ _ __- P. E. (—.1095) .0312 .8637 


The Patterns 


The true factor pattern for any set of vari- 
ables is probably very complex. The chief 
guiding principle in selecting a factor pattern 
is to try the simplest one first. A consider- 
able number of patterns might be formulated 
as defensible explanations of underlying cor- 
relations. The arrangement of these pat- 
terns determines the order in which factors 
are removed and since the first factor removed 
is generally favored, there arises a certain 
amount of ambiguity in the results. This 
ambiguity has been removed by Holzinger’ in 
the formulation of the “hollow staircase” type 
In this pattern, after the principal factor has 
been removed, the remaining factors may be 
taken out in any order and the factor loadings 
will not change their order of magnitude. 


The investigator set up four patterns. Pat- 
tern A involved four tests of a kind for five 
of the factors and five tests for the ‘“‘o”’ fac- 
tor. Pattern B dealt with the consideration 
of an overlap between space and attention, in- 

*K. J. Holzinger and F. Swineford, “Uniqueness of Factor 
—n Journal of Educational Psychology XXII (1932) 

? Spearman-Holzinger, Op. Cit., No. 5, p. 5. 


Mart 





volv} 
Only 
C, 4 
caus 
wide 
tern 
two 
illo 


the 











March, 1938] 1 STUDY OF OSCILLATION AS A UNITARY TRAIT 


volving the same variables as in Pattern A. 
Only five factors were dealt with in Pattern 
C, the perseveration tests being omitted be- 
cause of their low intercorrelations and the 
wide fluctuation with other variables. Pat- 
tern D was applied to all 29 variables. The 
two extra tests, memory and fluency, were 
illocated to one of the six assumed factors on 
the basis of their correlations. With the ex- 
eption of Pattern B, the “hollow staircase”’ 
type was used. 
Results of the Four Patterns 
In order to secure a basis of comparison for 
the secondary factors throughout each of the 
four patterns, the average variance for each 
f the factors was calculated. The results 
are summarized in Table IV. 
TABLE IV 
THE VARIANCE OF THE GENERAL FACTOR, THE 
SECONDARY FACTORS, AND THE SPECIFIC 
FACTORS FOR FouR PATTERNS 


Per Cent of Variance of Factors 


Pattern General Secondary Specifies 
ee acaideaeatiek 16.62 28.00 55.37 
Ss 29.23 55.38 
an ee 21.36 29.54 49.09 
| RIED ORS 18.56 24.76 56.68 


Undetermined specifics claim from 49 to 57 
percent of the factor weights while the general 
factor variance in no case exceeds the variance 
of the secondary factors. In Pattern B the 
variance of the secondary factors is nearly 
double that of the general factor. Evidently 
the assumption of a spatial-attention factor 
has not increased the specifics but has favored 
the secondary factors at the expense of the 
general factor. The elimination of the poor 
set of “P” tests in Pattern C has decreased 
the specifics and increased the general factor. 

In many instances the general factor load- 
ings for a test exceed those of the secondary 


we 
” 
~J 


factor. It is this overshadowing of the sec- 
ondary factors that makes it highly desirable 
to be able to analyze or split-up this general 
factor into more specific and meaningful 
components. 


It seems fitting to comment at this point on 
the recent findings of the Committee on Uni- 
tary Traits. Mr. Holzinger remarks as 
follows: 

In attempting to analyze the U. (general) 
factor we have eliminated Grade as well as 
Age... comparing this . . . wherein only 
Age has been eliminated, we find that the 
weights of U; drops from 28.81 per cent of 
variance ...to only 14.21 per cent... 
These results are very promising because 
they indicate the possibility of resolving 
the U: factor into a number of basic 
maturity factors." 


The fact that the general factor variance in 
all four patterns of this study does not in any 
instance equal the amount of the secondary 
factor variance would tend to support the 
findings of Mr. Holzinger. The subjects of 
the present investigation were adults. Would 
not age and maturation in this case be mini- 
mized thereby reducing the general factor 
loadings and increasing the secondary 
weights? 

The relative importance of the secondary 
factors are given in Table V. It is evident 
from this table that of all 29 variables studied, 
the spatial and “o”’ tests are the best devised 
because they are more highly saturated with 
what they are supposed to measure. Even at 
that, the saturation amounts only to approxi- 
mately 40 per cent. Thus the big problem 
for the factorists still remains,—that of de- 
vising tests that produce high saturations of 
what they are intended to measure. 


* Spearman-—Holzinger, Op. Cit., No. 6 


TABLE V 


AVERAGE LOADINGS FOR SECONDARY FAcToRS FrRoM Four PATTERNS 


Factor 
Spatial 
Mental Speed 
Motor Speed 
Perseveration 
Oscillation 
Attention 


Pattern 


A B Cc D Average 
. .4458 oT11 4342 .3670 .4045 
. 3062 .3326 .2685 .2281 .2839 
- «nen .0935 .1910 1154 .1383 


. «or 1763 sabeoes 1419 1615 
. .3868 .3672 3950 4067 3889 
- 1958 abi .1640 1930 1843 


sisi 1972 anew — 1972 
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The Magnitude of the Final Residuals for the 
Four Patterns 

Table VI shows the distribution of the re- 
siduals for each pattern with their means, 
standard deviations, and probable errors. The 
smallest mean, almost zero, occurs.in Pattern 
D where there are the greatest number of re- 
siduals recorded. It is interesting to note 
that the highest mean occurs with the lowest 
frequency and with the increase of the number 
of residuals, the mean decreases. This in- 
direct variation would seem to indicate that 
the variation in the mean is due to chance. 

With only a difference of .oo81 between the 
highest and lowest standard deviation, each 
pattern may be considered as an equally good 
fit. The slight increase in Pattern D is prob- 
ably due to the allocation of extra factors. 
The elimination of the perseveration tests 
yielded a slightly lower standard deviation in 
Pattern C but not enough to be judged 
significant. 

Multiplying the standard deviation of each 
pattern by .6745 and comparing them with 
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the P. E. of zero correlation for 117 cases, the 
P. E. of the standard deviations are all less 
than the probable error of zero correlation 
Using this as a rough basis of approximation. 
one can say that the final residuals of each 
pattern may be considered negligible.’ 


The indications of factors by the various 
statistical techniques has been verified by the 
resulting factor patterns. Each pattern, judg- 
ing by the P. E. of its standard deviation 
compared with the P. E. of zero correlation 
for the same number of cases, could be con- 
sidered a good fit. The spatial and “‘o” tests 
were most highly saturated with their respec- 
tive factors. 


SUMMARY AND INTERPRETATIONS 

The findings presented in this study lead t 
the following conclusions: 

(1) Length up to about 60 five-second in- 
tervals for “‘o” tests is sufficient to secure a 
good measure of “o”’. Good reliability re- 

* Spearman—Holzinger, Op. Cit., No. 4, pp. 3-4. 


TABLE VI 


FREQUENCY DISTRIBUTIONS OF FINAL RESIDUAL CORRELATIONS FOR PATTERNS A, B, C, AND D 


Value of Residual 
BO ...... 3000 . 
5 RE Me oe 
.2300 
.2100 
.1900 
.1700 
.1500 
.1300 
: je 
.0900 
.0700 
.0500 
.0300 
.0100 

—.0100 
—.0300 
—.0500 
“eee a ee eee 
—.0900 


OD eisai nksses el atch dade cousitnaanatvnrs 
SRE, Caribdbd send dnebehsadncbiasataeoues 
—.1500 

—.1700 
—.1900 
—.2100 
—.2300 
—.2500 
—.2700 __- 


6745 X S. D. 
P. E. zero correlation 


Pattern A Pattern B PatternC PatternD 


1 

1 1 2 

2 1 

1 2 2 

4 2 1 1 

5 5 3 10 

7 6 6 5 

7 8 8 14 
12 10 6 14 
15 22 11 17 
19 18 18 20 
25 20 19 24 
22 22 7 38 
17 24 23 32 
42 39 29 35 
29 26 13 43 
20 25 16 34 
25 21 20 30 
14 15 11 30 
11 9 6 10 
2 5 3 14 
6 5 3 6 

6 9 1 11 

5 1 2 4 

3 2 2 5 

1 1 1 

1 3 

1 

300 300 210 406 
.0074 .0118 .0130 .0004 
-0842 .0872 .0826 .0907 
-0568 .0588 0557 .0612 


.0624 .0624 .0624 .0624 
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< are secured from oscillation tests 60 to 
-< five-second intervals in length. 

2) The application of the Spearman- 
Brown formula to the continuous halves of 
the “o”’ test in general predicted results com- 
parable with the intercorrelations of the com- 
nlete tests. 

(3) Practice in no way affects the scores 
on the “‘o”’ tests. 

(4) The mean deviation method for scor- 
ing the “‘o’’ tests proved superior to Flugel’s 
method in both validity and reliability. 

(5) The narrowness of the time unit in 
scoring proved to be a most significant fact 
both in securing good measures of ‘‘o”’ and in 
yielding good reliability coefficients. 

6) The graphs of the two sets of data for 





oscillation from adults and children reveal 
similar characteristics. 

(7) There is some evidence to show that 
children oscillate over a wider range than do 
adults. 

(8) The poolings of the “o” tests as well as 
their factor weights indicate that they are 
fairly good measures of “‘o”’. 

In general the facts indicate that ‘“‘o” is a 
very sensitive behavior-unit. Only by nar- 
rowing the time basis can the individual dif- 
ferences of this trait be measured. Oscilla- 
tion turns out to be a very definite com- 
ponent in human abilities as revealed by the 
factor loadings, — the tests being saturated 
with the “o” loadings and possessing but 
small loadings with the general factor. 























SOME VERBAL ASPECTS OF THE 1937 REVISION OF THE 
STANFORD-BINET INTELLIGENCE TEST, FORM L 


ELDEN A. Bonp 


Research 


Teachers College, 


It is widely believed that the old revision of 
the Stanford Binet included many test items 
which depend on verbal knowledge for their 
solution. In fact some of the items, such as 
the vocabulary tests and abstract words tests, 
are tests of word knowledge. When Terman 
and Merrill constructed the new revision they 
attempted to reduce the number of these ver- 
bal items and were to a great extent success- 
ful on the lower levels. They found it ex- 
tremely difficult to devise non-verbal tests for 
the upper levels... The question therefore 
arises whether the tests on the upper levels are 
suitable for pupils subject to specific disabil- 
ity in reading. The purpose of the study re- 
ported in this article was to determine what 
items in the upper ranges (Year XIV, Aver- 
age Adult, Superior Adult I, Superior Adult 
II, and Superior Adult III levels) seem to be 
more difficult for the poor reader, than for 
the good one. 

During the Spring of 1937, the 346 second- 
term ninth-grade students in the Mansfield 
City Schools, Mansfield, Ohio, were given the 
1937 Revision of the Stanford Binet, Form 
L,* and the Iowa Silent Reading Test, Form 
A;* and from among these, sixty-five good 
readers were matched with sixty-five poor 
readers, on the basis of three criteria: sex, 
chronological age, and intelligence quotient. 


Assistant XI 
Columbia University ~ 


sigma in total comprehension on the reading 

test used. These children were also matched 

on years in school inasmuch as the policy 

the Mansfield City System is to advance t} 

children one grade a year, segregating thos. 

encountering difficulty for special remedial ir 

struction.* The statistical analysis of the ux 

matching is given in Table I. un 
The correlations of 91 and .73 and the 

critical ratios of .7 and .6 indicate that the A 

groups are very similar with respect to intel- | 

ligence quotient and chronological age. Be- 

cause reading is highly correlated with intel- ten 

ligence,* the children with high intelligen 

quotients tended to make the highest reading 

scores in each group. The correlation of 

in reading ability between the good and poor mu 

readers is evidence of this. However, 

critical ratio of 39.0 definitely establishes th 

fact that the two groups are distinctly differ- 

ent in reading ability. The homogeneity 

the two groups with respect to chronolos S] 

age results in the lower correlation of .732. ( 
An analysis of the errors on each Binet te 

item was made, and a total of the errors 

each item was taken. It was found that t 

poor readers had one or two fewer errors 

most of the items with the exception of t 

reported below in Table IT. 


j 





pa a *A more detailed report of the method of _searegati ng t Di 
The differences between the or od and poor unsuccessful children from the school px ypul at is giver 
— Y -} . men article by the author in the Journal of — eni whe 
readers in each matched pair were at least one july, 1937 
tt 
TABLE I . 
SUMMARY OF THE MATCHING OF THE GOOD AND PooR READERS Le 
T 
N 65 Pairs 
Chronological Age Iowa A I 
Intelligence Quotient (Months) (Raw Score) tt 
Good Poor Good Poor Good Poor in 
Readers Readers Readers Readers Readers Read 
Mean --- 108.68 108.42 177.46 77.2: 115.09 72.17 
sian: ee 11.7 4.5 4.6 21.32 19.41 
r ~< 97 73 90 
Diff. between Means 
7 4 ~ . V 
’ Diff. Al 6 39.0 
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TABLE II 


TEST ITEMS MISSED MORE FREQUENTLY BY THE POOR READERS 


Year Level Item 

XIV 1, Vocabulary ------ 

AA 1, Vocabulary ------- 
3, Abstract Words -- 

SA I 1, Vocabulary ------ i 
3, Sentence Building —- 

SA Ii 1, Vocabulary - — ‘ 

SA III 1, Vocabulary -- 


Since these cases were matched on the basis 

intelligence quotient, it was necessary for 
the poor readers to make higher scores on 
ther items to compensate for their relatively 
poor showing on the verbal tests. 

Intelligence quotients were then established 
by regrading each of the Binet tests, omitting 
hose items listed in Table II. When any 
tems were omitted from a year level the pro- 
edure was to increase the credit for the re- 
jaining items. The results of this treatment 
hanged some of the intelligence quotients as 
nuch as fifteen points, usually in favor of the 
poor readers. The statistical analysis of these 
results is given in Table III. 


TABLE III 


SUMMARY OF THE MATCHING OF THE INTEILI- 
GENCE QUOTIENTS OF THE GOOD AND POOR 
READERS AFTER THE OMISSION OF TEST 
ITEMS LISTED IN TABLE II 


Good Poor 
Readers Readers 
a 107.52 110.45 
eT Ee Ne en 11.8 12.8 
Re ren Oe a a ee 93 
Diff between Means .._-.--.-- . 5.0 


° Diff. 


The results listed in Table III indicate 
that after the correction had been made the 
poor readers on an average had a higher in- 
telligence quotient than the good readers. 
The correlation between the good and poor 
readers is still rather high, but lower than be- 
fore, and the critical ratio of 5.0 indicates 
that the two groups are significantly different 
in intelligence quotient. 


Conclusions 


1. Ninth grade children with poor reading 
ability tend to have more difficulty with the 
verbal elements of the 1937 Revision of the 


Errors 
Good Poor 
Readers Readers 

2 7 
——s . j 28 40 
‘ 19 26 
48 56 
43 49 
oe ——- 5S 64 


a eeaialanis 5 60 65 


Stanford Binet, Form L, than do children with 
good reading ability. Reading disability cases 
are unable to read as much, or as difficult 
materials, as good readers, and as a result they 
do not come in contact with as many words. 
Their vocabularies are not so well developed, 
and they are handicapped when taking any 
test which utilizes verbal elements in its con- 
struction. The fact that the poor readers did 
better on some of the non-verbal materials in 
this study is not an indication that they are 
correspondingly better on that type of ma- 
terial. The poor readers did only slightly 
better on each of the non-verbal test items 
(probably because they had to compensate at 
some place), since they were originally 
matched on the basis of intelligence quotient. 

2. It seems advisable to omit all the verbal 
tests listed in Table II when administering the 
1937 Revision of the Stanford—Binet to a 
reading disability case. 

The writer feels that the Vocabulary Test 
Items on the VIII and XII Year Levels 
would also be a handicap for poor readers. 
Some of the other test items, such as Reading 
and Report on the X Year Level, abstract 
words on the XI and XII Year Levels, and 
the dissected sentences on the XIII Year 
Level probably handicap poor readers. A 
similar study should be made using fifth or 
sixth grade children to establish significance 
of the verbal items on these lower levels. 

3. After the corrections were made on the 
Binets, some of the pairs differed as much as 
twenty points in intelligence quotient. The 
average was about three points, in favor of 
the poor readers. It has been the practice 
of some school psychologists to add a con- 
stant of from three to seven points to an in- 
telligence quotient obtained by a reading dis- 
ability case. This practice is open to ques- 
tion, since it is impossible to determine during 








-_- 


, 


a test situation how much the reading ability 
is affecting the test results. A more logical 
method would be to omit the verbal items re- 
ported above and to determine the intelligence 
quotient on the basis of the remaining test 
items. 

4. The change of approximately three intel- 
ligence quotient points in favor of the poor 
readers is not exactly a fair picture. Some of 
the matched pairs did not take the test items 
on the levels where all, or most of the correc- 
tions were made. Consequently they pulled 
down the averages. By omitting the lower 23 
matched pairs an average intelligence quo- 
tient of 117.7 was obtained for the poor read- 
ers and an average intelligence quotient of 
113.4 was obtained for the good readers. This 
difference of 4.3 is more significant than that 
reported in Table III. By omitting these 
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cases a lower correlation and a higher critica 
ratio would also be obtained. FLL 
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FLUCTUATIONS IN THE CORRELATION BETWEEN PSYCHO- 
LOGICAL TEST SCORES AND UNIVERSITY GRADES 
Dewey B. Stuit 
University of Nebraska 


[he psychological examination is an instru- 
ent widely used for the purpose of predict- 
» educational and vocational success. In 
llege personnel work we are anxious to de- 


ble success in academic work. It is very 
esirable, therefore, to have available instru- 
ments which will predict scholastic success 
, rather high degree of precision. The 
fectiveness of a personnel program is in part 
lependent upon the intelligent use of prognos- 
measuring devices. 
[t was the purpose of this investigation to 
determine the relationship between Ohio Psy- 
hological test scores and freshman grades 
ver a five year period. If there is consider- 
ible relationship between grades and psycho- 
ical test scores, then it should be possible 
to use the psychological examination for the 
purpose of guiding the individual with respect 
to his educational program. Particularly is it 
necessary to know whether the instrument 
used for purposes of prediction operates with 
i great deal of consistency from semester to 
semester and from year to year. If we can 
expect the same degree of efficiency from year 
to year, then it becomes possible to place 
greater reliance upon the psychological test as 
a prognostic instrument in college personnel 
work than would otherwise be the case. 
The Ohio Psychological Examination, Form 
17, was administered to the majority of 
leachers College freshmen from 1932 to 1935 


inclusive. Form 18 of the examination was 
used in 1936-37. The scores on these tests 
are recorded in the office of the freshman ad- 
viser together with other personnel data per- 
taining to Teachers College freshmen. 

Grades are reported at the University of 
Nebraska in terms of percentages. Roughly 
speaking a grade of 60-70 can be considered 
a D, 70-80 a C, 80-90 a B, and go—100 an A. 
The university average is usually about 75 or 
76. In this study grades are reported in 
terms of weighted averages, that is, a five-hour 
course carrying a grade of 80 is given a cor- 
respondingly greater weight in the average 
than a three hour course in which a grade of 
80 was earned. 

The most significant findings of the study 
are reported in Table I. It will be noted that 
the correlations for the first semester vary 
from .43 to .62 and for the second semester 
from .41 to .58. As one would expect, the 
best results were obtained with Form 18. 
The standard errors of these correlations vary 
from .04 to .o6. It will be noted also that in 
1933-34 and in 1935-36 the correlation for 
the second semester is greater than it was for 
the first semester. While one might antici- 
pate a lower correlation for the second semes- 
ter because of a more homogeneous population 
being enrolled at that time, the results do not 
bear out this hypothesis. Examination of the 
standard deviations will show that the vari- 


TABLE I 


MEANS, STANDARD DEVIATIONS AND COEFFICIENTS OF CORRELATION FOR 
FRESHMEN GRADES AND OHIO PSYCHOLOGICAL TEST SCORES 











Year N Mean S.D. Mean s.D r 
Psych. Score Psych. Score Grade Grade Grade & Score 

Semester Semester Semester Semester Semester Semester 

1 2 1 2 1 2 1 2 1 2 1 2 
\ 206 176 96.60 97.90 32.80 37.90 76.86 77.88 9.48 8.10 .46 Al 
19338 ..... 169 153 93.50 94.40 33.20 33.20 77.37 77.07 843 9.00 .54 58 
1934 _.... 245 202 90.40 94.60 34.30 33.70 75.96 76.53 9.84 8.58 .53 45 
a 231 203 89.70 92.79 33.10 33.00 77.58 78.75 10.23 846 .43 46 
| a «| 206 76.20 76.61 22.15 22.42 78.56 76.87 8.58 9.95 .62 55 

* Form 18 
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ability of the group each year was much the 
same during both semesters. 

The results of this investigation are on the 
surface not in agreement with those reported 
by Williamson* for the Arts College of the 
University of Minnesota. Williamson found 
the correlation between college aptitude test 
scores and scholarship to decrease from 1928 
to 1935. Correction for homogeneity did not 
remedy the situation. Williamson offers the 
suggestion that educational reorganization at 
the University of Minnesota probably ac- 
counts for the decreasing relationship between 
aptitude test scores and university scholarship. 
If personnel work is effective, low aptitude 
students should be guided into courses which 
are commensurate with their abilities. It is 
likely, therefore, that such procedures will re- 
duce the relationship between grades and apti- 
tude test scores. Since only students with 
high aptitude ratings are permitted to register 
in the Arts College, it is possible that grade 
standings have not been adjusted to the na- 
ture of the student population. Students who 
in a more heterogeneous group would receive 
above average grades are now probably aver- 
age or below. It looks, therefore, as though 
personnel work by its very nature should oper- 
ate in such a way as to reduce the magnitude 
of the correlation coefficients measuring the 
relationship between university grades and 
college aptitude. 

The conditions at the University of Ne- 
braska are quite different from those prevail- 
ing at the University of Minnesota. No se- 
lective admission standards are in operation, 
and no educational reorganization has taken 
place. It is true that Teachers College has 
a freshman personnel program, but this has 
not operated in such a way as to reduce the 
heterogeneity of the group. Furthermore, 
this program receives greatest emphasis dur- 
ing the first semester, inasmuch as all fresh- 
men are required to take the orientation 
course at that time. Hence, it appears likely 
that the chance for reducing the magnitudes 
of the coefficients is as great during the first 
semester as it is during the second semester. 

The writer is of the opinion that the serious 
economic conditions of the past five years 
have played their part in disturbing the cor- 
relations. In an unreported phase of this in- 
vestigation it was found that the average psy- 
chological test score of those dropping out at 
the close of the first semester is only about 
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ten points below the average of the Teacher: 
College group. Hence, the group remain: 
heterogeneous from the first to the second 
semester. Unequal motivation and unequal] 
encouragement also probably resulted in yari- 
able conditions which served to reduce th, 
magnitude of the coefficients in some insta; 
while increasing them in others. 

An interesting sidelight of the investigat} 
and a possible explanation of the fluctuation 
in the size of the correlations from year t 
year is the possible change in freshman intel- 
ligence from 1932 to 1936. Thompson,’ in a 
combination questionnaire study and surve 
of the data reported in the Educational Rec- 
ord concerning the American Council on Ed 
cation psychological test scores, found indica- 
tions of an increase in freshman intelligenc: 
in the colleges which he included in his inve: 
tigation. Edgerton’ reported similar finding: 
in a study of the intelligence of freshmen en- 
rolled in the colleges belonging to the Oh 
College Association. Williamson,’ on the 
other hand, using the Minnesota College Apti- 
tude Test as the means for measuring the 
abilities of the students entering the Unive 
sity of Minnesota, failed to find a decided 
change in freshman intelligence for the uni- 
versity as a whole. In certain colleg 
changes were brought about as a result of new 
admission standards which were put int 
effect in 1932. 

The results of the present study are n 
in harmony with those reported by William- 
son. From 1932 to 1935 there was actuall) 
a decrease in the average score in Form 17 
the Ohio Examination. Since no studies have 
been made of the other colleges in the univer- 
sity, it would be difficult to determine whether 
the decrease was characteristic only of Teach- 
ers College or if it indicated a general trend 
during the four year period. It happens that 
the average score in Form 18 was correspond 
ingly somewhat higher, if the table of norm: 
for Forms 16, 17, 18, and 19 as furnished by 
the Ohio State University is dependable 
throughout the entire range of scores. It 
may be, however, that the apparent “jump 
in intelligence is due to the test rather than 
the change in freshman population. 

In view of the fluctuations noted in the 
magnitudes of.the coefficients of correlation 
measuring the relationship between scholar- 
ship and psychological test scores, it seems 
advisable to suggest that a more careful study 
should be made of those individual students 


Lt 
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suse the magnitudes of our coefficients 
iry from semester to semester and from 
to year. These cases could be spotted 


» examining the scatter diagrams constructed 


the purpose of calculating the coefficients 
orrelation. By making an intensive study 


f these cases it is possible that we might ob- 


information which would enable us to 
ke better use of psychological tests in per- 
el procedures, and thus increase the ef- 


fectiveness of our work. After all it is the 
individual and not the group which is of 


OF 


ior concern to those who are interested in 
e guidance function of education. 


SUMMARY 


[he magnitudes of the coefficients of corre- 
n measuring the relationship between 
io psychological test scores and university 


grades of Teachers College freshmen at the 


[ 


niversity of Nebraska tend to fluctuate from 


semester to semester and from year to year. 
\\ 





hile a drop in the magnitudes of the 
efficients of correlation for the second se- 





mester might be expected, such was not found 
to be the case in the present investigation. 
The causes of the fluctuations cannot be iden- 
tified through the study of averages and 
standard deviations. A critical study should 
be made of individual students in order to dis- 
cover those factors which cause the magni- 
tudes of the coefficients to fluctuate. 
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PHOBIAS AND THE PRESSEY X-O TEST 


N. FRANKLIN STUMP 
Professor of Psychology and Education 
Keuka College, Keuka Park, N.Y. 


The number of experimental studies which 
can be undertaken by the use of the Pressey 
X-—O test’ is almost unlimited. The extent 
of application of this unique test to the inves- 
tigation of many interesting problems rests 
largely upon the ingenuity of the experi- 
menter. Utilization of this test for the dis- 
covery of personality traits in individuals has 
scarcely made a beginning. 

Section I of the Pressey test measures 
sensitivity to unpleasant topics and objects. 
The purpose of the test is not explained to 
the subject but he is given freedom of oppor- 
tunity to check words pertaining to sex, fear, 
disgust, and self-feeling. Section II of the 
test did not seem to have any direct relation- 
ship to the subject of this paper, phobias, so 
the results on this section were ignored. In 
Section III the subject checks situations 
towards which he holds a disapproving atti- 
tude. In Section IV he indicates topics about 
which he has worried, and suggestions con- 
cerning which he has felt nervous. 


The purpose of this study was to determine 
the extent of differences on Test I, Test III, 
and Test IV of the Pressey X—O Test, be- 
tween those subjects with phobias and those 
without phobias. ‘The significance of the dif- 
ferences between these two groups of subjects 
on single and combined tests was determined. 
Since Test I could be graded on the basis of 
the number of words checked in four different 
fields, in terms of dislikes, namely, sex, fear, 
disgust, and self-feeling, all of these elements 
were considered separately as well as in com- 
bined form. 


Before comparisons could be made between 
the two groups two factors were carefully con- 
sidered: (1) Are the subjects from one and 
the same population, differing significantly 
only with respect to the possession or non- 
possession of phobias? (2) Are the phobias 
possessed by the subjects sufficiently potent 
to affect seriously social adjustment in par- 
ticular situations or are the so-called phobias 
only “hypercritical or finicky” feelings which 
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are common among the general population ¢ 
a greater or less degree? 

In answer to the first question objective test 
results, which were available for the subjects 
seemed the most adequate manner of deter- 
mining the extent of similarity of the groups 
The results on the American Council Psycho- 
logical Examination® were available; the per- 
centiles rather than the gross scores being 
used. It was found that the mean percentile 
for the group with phobias was 73.9, withou: 
phobias 69.6. There was a difference of only 
4.3 percentile points between the two groups 
the percentile scores for the group with pho- 
bias ranging from 52 to 95; without phobias 
from 52 to 98, a range of 43 and 46 
respectively. 

To what extent is a difference of 4.3 
general ability significant? The standard 
deviations for the groups with and without 
phobias were 14.60 and 16.48, respectively 
the standard errors of the means being 4.8- 
and 5.80. The standard error of the differ- 
ence was found to be 7.3; the ratio between 
the obtained difference of the means and the 
standard error of the difference being ex- 
tremely small, namely, .59. Thus the value 
of “P” according to “Student’s” tables® is be- 
tween .6 and .5. The conclusion is that the 
difference in general ability between the tw 
groups is insignificant. 

Results which could be used for the equat- 
ing of the groups were available from 
other test. The Allport-Vernon Study-o/- 
Values* test consists of six parts: Theoreti- 
cal, Economic, Aesthetic, Social, Political and 
Religious. While the two groups did not 
make identical average scores on these sec- 
tions, there was considerable similarity be- 
tween them throughout the entire test. In 
some instances, as would be expected, there 
is a slight advantage for the group with pho- 
bias, and in other instances a slight advantage 
for the without-phobias group. 

2? Published by The American Council on Education 
ington, D. C. 

* Fisher, R. A.. Statistical Methods for Research W 
Oliver and Boyd, London. 
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For the section of the Allport test dealing 
with theoretical values the group with phobias 
made an average score of 25.8; without pho- 
bias, 23.0. The test-score results on the re- 
maining tests for two groups were as follows: 
Economic, 24.3, 27.1; Aesthetic, 33.1, 31.8; 
Social 31.4, 33.0; Political, 28.8, 25; Relig- 
ious. 37-95, 38.0. These results show a slight 
idvantage, 1.9, in the first test for the group 
with phobias; 2.8 increase in second test, for 
the without-phobia group; 1.3 increase in 
third for without-phobia group; 1.6 increase 
in fourth, for without-phobia group; 3.8 in 
fifth, for with-phobia group; .o5, in sixth, for 
without-phobia group. 

The differences in general ability and in 
sense of values would indicate that the sub- 
jects are from one and the same population. 


The second question relative to the potency 
of the particular phobias and the extent to 
which they affected the social adjustment in 
certain situations seemed of major importance 
before comparisons could be accepted. Of 
course, the phobias are of varying degrees of 
strength in different individuals, but all of the 
fears seemed sufficiently serious to be classi- 
fied as real phobias. Because of the variety 
of phobias possessed by the group, it will be 
impossible to give even brief case histories® 
for all of them. A few cases must suffice. 


Feather Phobia 

The subject cannot recall the reason for 
her extreme fear of feathers. She stated 
that she was not born with this fear and did 
not develop it in very early childhood, for 
her parents can recall of her having made 
pets of chickens to the extent of calling 
them by name and playing with them as a 
child might with a doll. However, she re- 
calls none of this; her only recollection is 
that she was dreadfully afraid of chickens, 
birds of all kinds, and all feathers. Living 
or dead birds frighten her and feathers of 
any kind are instruments of torture to her. 
Her older brothers often made use of this 
knowledge and made her do their bidding 
by the mere threat of touching her with a 
feather. Serious lapses in friendship have 
resulted from individuals trying to scare her 
with feathers. 


'The writer is indebted to Miss Pogoda for the case his- 
tories in this study. She is believed to be ey —- 
tent in the recognition of real phobias because she possesses @ 
extreme unnatural fear in a specialized field, and is eavetene 
more able perhaps to recognize what should be regarded as a 

-al phobia in others. 
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Her mother, at some time, forced her in- 
to a room and put a pet canary up to her 
cheek in an attempt to cure her of this 
fear; but she became terrified, grew pale, 
screamed, and trembled uncontrollably. 


She actually avoids touching pictures of 
birds or feathers in books and never wears 
feathers in her hat — even stiff, artificial 
ones. She understands that no physical 
harm can come to her through contact with 
a feather, but she cannot make herself 
touch one. She has tried and has succeeded 
for a moment to hold a tiny, white feather 
in her hand, but, as soon as she realized 
what she was doing, she dropped it shudder- 
ing involuntarily. 

At college, she had an unpleasant experi- 
ence with a director who insisted that she 
wear a cloak decorated with feathers in a 
play. She insisted that she could not, and 
had the feathers torn off the cloak before 
she could possibly wear it. 

No other fears as of bugs or mice, disturb 
her, but her fear of feathers is amazingly 
acute. She would rather be placed in a 
cage with lions, she says, than in a cage 
with chickens. 


Spider-and-Millers Phobia 

From earliest recollections the subject 
was exceedingly fearful of spiders and mil- 
lers. She has no recollection of the origin 
of the fear but merely remembers always 
being afraid of spiders and millers. 


One of the first incidents in her experi- 
ence with this fear was at a time when she 
was seven years old. She was spending a 
summer at a camp in which there were 
many cobwebs and spiders. Her fear was 
so great that she slept with covers pulled 
over her head and awoke in a cold sweat. 

On other occasions she would refrain 
from opening windows at night (though she 
is an advocate of fresh air in great abund- 
ance) merely because a spider web was 
somewhere in the vicinity and the possibil- 
ity of millers flying in through the window 
was great. She never can summon enough 
courage to kill a spider or miller, but rather 
shrinks back in utter terror. Many un- 
pleasant situations and cruelties have re- 
sulted from this unnatural fear—such as: 
(1) The subject’s being chased by an older 
sister who pursued her with a spider held 
menacingly forth, (2) A college student 
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placing spiders near her belongings or pur- 
suing her with them. 

This fear she claims is a real and vital 
one, far greater than the average individ- 
ual’s understandable squeamishness toward 
bugs and mice. 


Case histories have been prepared for each 
of the subjects, all college students, who pos- 
sessed an extreme unnatural fear which, at 
times, greatly affected her social adjustment, 
but space will not permit a presentation oi 
them. The phobias of the subjects were: 
falling from high places, being chased (just 
for fun), spiders, millers, bridges, sharp 
points, addressing audiences, fire, deep water, 
and feathers. 

Still another instance will show the unnat- 
ural degree of these fears. The individual 
who is extremely afraid of addressing audi- 
ences has caused no end of worry for herself 
during the past year and a half in college. In 
courses requiring oral reports before the class 
she has made special arrangements with pro- 
fessors to avoid speaking before the group. 
It is believed, by some instructors, that she 
would discontinue her college course rather 
than make an oral report. This is not due to 
any lack of ability to offer a satisfactory re- 
port but due to an absolute loss of emotional 
control when facing groups during a formal 
report. One of the professors inquired of her 
what she thought would happen if she were 
compelled to make an oral report before the 
class group. She replied that she believes 
that she would lose all consciousness and 
would perhaps fall to the floor in a faint. 


RESULTS 


Table I presents the means, standard devi- 
ations, standard errors of the means, standard 
errors of the differences between the means 
(for subjects with and those without pho- 
bias), the ratios of the actual differences be- 
tween the means and the standard errors of 
the differences, and the extent of the signifi- 
cance of the actual differences as determined 
from tables of “Student’s” distribution. 

Test I (affectivity scores) seems to ap- 
proach absolutely significant differences in 
discriminating between the subjects with and 
without phobias. The subjects without pho- 
bias dislike fewer objects and things. These 
subjects are also less sensitive to and express 
less dislike for words pertaining to sex, fear, 
disgust, and self-feeling. Whether there are, 
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however, significant differences between the 
subjects in all these respects is described jy 
the following paragraphs. 

Test I (affectivity score) dealing with un- 
pleasant objects and topics, and “disgust” 
words in Test I have “P” values between .o; 
and .o2 and, therefore, the means show sig- 
nificant differences between the subjects wit) 
and those without phobias. The ratios be- 
tween the actual differences of the means and 
the standard error of the differences are iden- 
tical, namely, 2.40. It is interesting to note 
that these two measures are far more signifi- 
cant in describing the differences between 
these two groups of subjects than are the 
combined affectivity scores on Tests I, III, 
and IV of the Pressey test. This may point 
to the necessity of increasing the repertoire 
of objects and things which are satisfying to 
those individuals who possess phobias. Sin: 
many phobia cases offer rather stubborn prob- 
lems before the subjects can finally conquer 
their difficulties, unpleasant attitudes, it ap- 
pears, must be subdued by pleasant ideas and 
a satisfying feeling-tone. The effectiveness of 
Test I (affectivity score), in revealing signifi- 
cant differences between the phobia and non- 
phobia groups is verified, therefore, by the 
“disgust” words in Test I which also show 
significant differences. 

The “self-feeling” words in Test I and the 
total affectivity scores on Tests I, IIT, and IV 
do not show significant differences between 
the phobia and non-phobia groups. It seems 
that self-feeling does not intensify highly spe- 
cialized unnatural fears; however, this ques- 
tion should be attacked by further experi- 
mentation. 

The “fear” words in Test I, and topics 
which cause the subjects worries or make them 
nervous in Test IV, show no significant differ- 
ences between the two groups of subjects. 
The lack of demonstrative differences between 
the subjects on the “fear” words in Test I may 
be due to the fact that phobias are extremely 
specialized fears, i.e., an individual may have 
one extreme fear and yet not fear the ordi- 
nary things generally held as disturbing to 
the majority. 

“Sex” words in Test I show no significant 
differences between the means of the two 
groups. This is what would be expected. 
Furthermore, mere disapproval of certain sub- 
jects, Test III, does not show significant dif- 
ferences. It appears that the dissatisfying, 
unpleasant, disgusting situations must be suf- 
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A STUDY IN THE PREDICTION OF COLLEGE 
FRESHMAN MARKS* 


T. D. D. Quai 
University Station 
Enid, Oklahoma 


I. INTRODUCTION 


Colleges have long been interested in the 
prediction of student scholarship, especially 
the minimum degree of scholarship required 
by the institution for graduation. This atti- 
tude is reflected in many kinds of require- 
ments, nearly all of which have had as their 
primary aim the forecasting of the ability of 
the student to succeed in the institution, so 
far as academic achievement is concerned. 

It had been assumed, until recent years, 
that evidence of success in the secondary 
school was sufficient indication of ability to 
succeed in college or university. This was a 
more logical attitude to take in times past, 
when Greek, Latin, and religion constituted 
the major offerings in both secondary school 
and college. However, as curricular offerings 
expanded and diversified in both secondary 
and higher institutions of learning, it was ob- 
served that many who were admitted to col- 
lege failed, that previous education was often 
inadequate, and that the correlation between 
success in secondary education and in higher 
education was low. These _ observations, 
coupled with the fact that either law or pub- 
lic opinion forces public institutions of higher 
learning to admit all high school graduates, 
have led to the use of many measures, the 
purpose of which is to evaluate the ability of 
the freshman after he has been admitted. 

Recent trends, in general, have been from 
traditional and subjective criteria of freshman 
ability to experimental and objective meas- 
ures. Among the measures used are: high 
school marks, mental tests, aptitude tests, 
achievement tests, character tests, and the pat- 
tern of high school subjects. The first two 
mentioned are by far the most frequently 
used of the measures listed. 

The results, on the whole, have been disap- 
pointing, largely because they were true of the 
group as a whole but did not apply to indi- 
viduals or to types of students, with any de- 


* Summary of a thesis for the Ph.D. Jniversi 
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termined degree of definiteness. The findings 
did not reveal what was happening to the boy 
as distinguished from the girl, or to the 
bright student as compared with the dull. 

Studies have been limited, in nearly all 
cases, to one or two predictive factors in a 
given institution. Procedure has consisted, 
for the most part, in running a simple corre- 
lation between the predictive factor and col- 
lege marks, the authors failing to validate 
their predictions by concrete application to 
the student body. 


Further development of the problem will 
doubtless be influenced by a number of fac- 
tors, including the philosophy of education on 
the college and university levels, further 
growth of enrollments in colleges and univer- 
sities, with accompanying developments in 
personnel work, conditions and differentia- 
tions of employment among the educated 
classes, further developments and refinements 
in the science of testing, including character 
traits, and the development of more reliable 
means of determining high school and college 
marks. 


Because of curricula varying both in con- 
tent and difficulty in different institutions of 
higher education, varying medians of ability 
in freshman classes in different colleges and 
universities, varying methods of teaching and 
standards of achievement in different institu- 
tions of higher learning, wide differences in 
faculty, buildings, and equipment of different 
institutions, and many other variables too 
numerous to mention here, no uniform form- 
ulae or criteria can be prescribed for the eval- 
uation of ability of freshmen to succeed in all 
colleges and universities. Each institution 
must, in a measure, settle the problem for it- 
self. Probably the best scientific attack on 
the problem of prediction is to select an insti- 
tution of higher learning as nearly typical of 
a general class as possible and study inten- 
sively a number of predictive factors in that 
institution as a basis for the evaluation of 
the ability of entering freshmen to succeed in 
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all similar institutions. That is the plan of 
this study, and Phillips University is the in- 
stitution chosen for the investigation. 

The agents used for prediction in the study 
are: (a) The American Council Test, as a 
measure of intelligence; (b) The Ohio State 
University Psychological Examination, as a 
measure of intelligence; (c) high school 
marks, as a complex measure, including intel- 
ligence, academic achievement, character 
traits, and perhaps other factors; (d) first 
semester college marks, used both as the thing 
predicted by the other agents, (a), (b), (c), 
and (d) and as itself an agent for the predic- 
tion of second semester marks; and (e) the 
Purdue Placement Test in English, as a check 
on the value of special aptitude tests in pre- 
dicting marks in a special field. 

No attempt is made to justify, as predictive 
agents, any of the measures used, or to in- 
crease the validity or reliability of any of 
them. The purpose of the study is to deter- 
mine, as far as the techniques employed are 
capable of determining, the amount of pre- 
dictive value in forecasting college marks that 
these measures have, individually and collec- 
tively, theoretically and practically, without 
reference to the question as to whether they 
should have more of such value, and also 
without reference to the means of increasing 
said value. 

The investigation was made in Phillips Uni- 
versity, because in all its departments it is 
predominantly a liberal arts school, the type 
in which most of the measures used have been 
validated; it has a cosmopolitan student 
body; and the data for the study were read- 
ily accessible to the author. 

The sources of data used in the study are: 
(1) the scores resulting from giving the Ohio 
State University Psychological Examination, 
Form 18, to freshmen at Phillips University in 
December, 1934; (2) the scores resulting from 
giving the American Council Test to Phillips 
University Freshmen, in September, 1934; 
3) scores resulting from giving the Purdue 
Placement Test in English to freshmen at 
Phillips University, in September, 1934; (4) 
the transcript of high school marks for each 
freshman who had Ohio State University Psy- 
chological Examination scores, American 
Council Test scores, and Purdue Placement 
Test in English scores on file in the registrar’s 
office for the year 1934-1935; and (5) marks 
received during the first and second semesters 
of 1934-1935 in Phillips University, by each 
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student in the study. One hundred forty 
freshmen, seventy-five boys and sixty-five 
girls, were found to have complete records in 
these five measures. 

From these data eighty-six simple correla- 
tions were calculated as a basis of direct com- 
parisons, as well as multiple correlations and 
regression equations needed in the study. 
These interrelationships are shown in Table I. 

The value of the measures used in predict- 
ing marks in specific subjects or subject-fields 
is sought by the use of the differential corre- 
lation formula (recently developed and vali- 
dated by Segel) , comparing results with actual 
marks obtained by the students. The Criti- 
cal Point, that is, the mark below which a 
student may not fall and still be likely to suc- 
ceed, academically, at Phillips University, is 
determined. 


II. LimMItrATIONS AND RELIABILITY OF DATA 


College marks are used as the basis of many 
decisions of great importance to students. 
They are used to forecast his further success 
as a college student; to determine his fitness 
to engage in certain occupations, his eligibility 
to honor societies and athletic activities, his 
merits in contests for prizes and scholarships, 
his qualifications to pursue specific courses of 
study; and to serve as a basis on which to 
predict his general success in life. Since the 
student’s destiny so largely depends upon his 
college marks, and they in turn are largely 
determined by the total situation at a given 
school, factors influencing marks at that 
school should be carefully studied as a basis 
for predicting, before or soon after the student 
enters the school, as nearly as possible what 
the student’s marks will be. With this in 
mind the situation at Phillips University is 
presented in the following paragraphs to serve 
as a background for this study. 

Phillips University uses a typical five-point 
marking system as follows: S (superior) is 
given to about the upper ten per cent of the 
class; G (good) is given to the next lower 
twenty-five per cent; M (medium) is given to 
the next thirty-five per cent; I (inferior) is 
given to the lowest fifteen per cent of the class 
passing; U (unfinished) satisfactory work, or 
C (conditioned) unsatisfactory work, or F 
(failure) is given to those not passing the 
course. A mark of S earns for the student 
three honor or credit points for each semester 
hour of credit received in the course; a G 
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earns two; M, 1; I, o; U, C, and F—t1 each. 
The number of credit points earned divided 
by the number of semester hours of enroll- 
ment equals the scholarship quotient of the 
student. A scholarship quotient of 1 is re- 
quired for graduation, and a student with a 
smaller quotient is on probation until such a 
time as his quotient equals or excels 1, or un- 
til the student withdraws from school. 


For the purposes of this study the literal 
symbols of the marking system were quanti- 
fied by assigning a value of 5 to S, 4 to G, 
3 to M, 2 to I, and 1 to U, C, or F. This 
was done to avoid zero and negative quanti- 
ties in the statistical treatment of the data. 
Literal marks and their equivalents on high 
school transcripts were quantified in the same 
way. 

Fields and subjects in which college marks 
were used in this study are English (compo- 
sition and rhetoric); mathematics (algebra, 
trigonometry, solid geometry, and analytical 
geometry); science (physics, chemistry, biol- 
ogy and geology); foreign language (French, 
Spanish, German, Latin and Greek); and his- 
tory and social science (American history and 
government, European history, and psychol- 
ogy). Courses in these subjects are, for the 
most part, continuous through the freshman 
year, being, in fact, year subjects rather than 
semester subjects. In 1934-1935 ninety-six 
per cent of the students in English I who re- 
mained through the year took English II, and 
ninety-one per cent of these had the same 
teacher they had the first semester. Seventy- 
six per cent of the students in mathematics 
(algebra-trigonometry) the first semester took 
algebra-analytics, or algebra-solid geometry 
the second semester and eighty-five per cent 
of these had the same teacher the second 
semester. Eighty-seven per cent of freshmen 
students in foreign language the first semes- 
ter continued the same subject the second 
semester with ninety-two per cent having the 
same teacher. Ninety-four per cent of fresh- 
men in science continued the same subject the 
second semester with ninety-five per cent hav- 
ing the same teacher. Fifty-nine per cent of 
freshmen in social science continued the same 
subject the second semester, and eighty-four 
per cent made no change in teachers. 


The subjects used in the study were fresh- 
men at Phillips University for the year 1934— 
1935. One hundred forty students were 
found with complete records in the measures 
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used in the study. More could have been 
included if fewer measures had been employed 
It was thought, however, that an intensive 
study of a smaller number would be prefer. 
able to a superficial investigation of a larger 
group. The size of the group, beyond the 
number required to give a fairly smooth curve 
of distribution, is important only to the extent 
that it reduces the sampling errors. The de- 
gree of influence that larger numbers might 
have in an investigation, granting that the 
group used is representative, may be illus- 
trated by reference to the probable error of 
the mean of scores made on the Ohio State 
University Psychological Test by the group 
used in this study. The mean is 72.84, and 
the probable error of the mean is 1.18. Now, 
in order to cut this error in half (to .59) by 
increasing the size of the group, N (140) 
must be multiplied by 4, that is 560 cases 
must be included in the study (4, p. 125). 
If we wish to cut the error to one-eighth of its 
size (to .15), N (140) must be multiplied by 
64, that is, 8,960 cases must be included in 
the study. 


The normalcy of the curve of distribution 
of the Ohio State University Psychological 
Examination scores received by the students 
of this study, as shown by the superposition 
of the normal curve upon it, provides evidence 
that the group used is adequately representa- 
tive of the larger group of college freshmen. 
A technique developed by Dickey (2, p. 439) 
was used to reveal this relationship. G (nor- 
malcy) is found to be .go+ .o171. The G 
of 1255 University of Oklahoma Freshmen on 
the same test given at the same time is 
.go + .008. 


The writer is aware that the reliability of 
college marks is a crucial factor in the final 
solution of the problem under consideration. 
The reliability of semester marks at Phillips 
University is unknown, and at present, in- 
capable of being determined with any high 
degree of accuracy. This is so, chiefly, be- 
cause the teachers have no common basis on 
which to assign the marks. They also have 
few objective standards for the evaluation of 
the marks assigned. An inquiry was made 
among the teachers of Phillips University to 
determine the factors considered by them in 
assigning semester marks. It revealed the 
fact that many other things besides academic 
achievement are considered in making up the 
students’ marks. Thirteen different factors 
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are used. Three teachers considered all these 
factors, and one considered only three, eleven 
being the median number employed. There 
were no two teachers who used the same set 
of factors with anything like the same degree 
yf emphasis on corresponding factors, nor was 
there agreement among the faculty on the rel- 
ative importance of any single factor. The 
situation is quite chaotic. 

A study of teachers’ marks based upon sec- 
ond semester examinations at Phillips Univer- 
sity for 1934-35 showed a reliability coeffi- 
cient of .780 + .073. This coefficient was 
used for validation of results obtained in the 
study. 

On the basis of data given by Ruch (9, p. 
107) and (10, p. §3) and Symonds (14, p. 
401), a reliability coefficient estimated at .60 
for high school marks is used in this study for 
validation. Published reliability coefficients 
for the other measures used in the study are: 
Ohio State University Psychological Examina- 
tion .92 + .003, (7, p. 2048.1); American 
Council Test .95 (13, p. 135); Purdue Place- 
ment Test in English .95 (8, p. 3). 

The much higher reliability of the stand- 
ardized tests as compared to high school and 
college marks as presented here is especially 
noticeable. This situation will probably con- 
tinue to be so until there is better agreement 
among teachers as to what should determine 
course marks, and until there is a more uni- 
form standard for evaluating the factors in- 
cluded in marks. The importance of this 
wide difference in the reliability of the meas- 
ures used in this study will become manifest 
in a later section of the investigation. 


III. SratisticaAt ANALYSIS OF 
RELATIONSHIPS 


Intercorrelations among measures used in 
the study are shown on Table I by the numer- 
als one to sixteen along the rows, and two to 
sixteen along the columns, these figures being 
employed to identify measures used in regres- 
sion equations. The first number in each cell 
in the body of the table is r, the middle one is 
the probable error of r, and the lower one is r 
corrected for attenuation. 

A brief study of Table I reveals the fact 
that of the first three measures, which are 
used as predictive agents only, the Ohio State 
University Psychological Examination scores 
yield the highest correlation with the first 
semester average of college freshman marks 
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(.522), high school average marks the second 
(.516) and American Council Test scores the 
third (.408). The difference between .522 
and .408 is not statistically significant, but it 
is enough to indicate some probability (ninety 
chances in a hundred) that a true difference 
exists in the direction here indicated. The 
difference (.114) is .475 as large as it should 
be to be entirely significant. The average of 
these correlations is .482. When correlated 
with second semester average of college marks 
(column five) these three measures rank, high 
school average first (.597), Ohio State Univer- 
sity Psychological Examination scores second 
(.557) and American Council Test scores 
third again (.412), the average of the three 
correlations being .522. The correlation of 
the same three measures with first semester 
college freshman marks in five specific subject 
fields shows that the Ohio State University 
Psychological Examination scores rank first 
with an average r of .473; high school aver- 
age marks rank second with .385; and Amer- 
ican Council Test scores rank third with an r 
of .367, the average of the three being .410. 
The rank order of these three measures is the 
same when they are correlated with second 
semester college marks in the same five sub- 
ject fields. 

The average correlations of these three 
measures with each of the subject fields in- 
cluded in the study are as follows: First 
semester, mathematics .553, English .440, so- 
cial science .362, science .353, and foreign lan- 
guage .346; second semester, mathematics 
.764, English .452, social science .437, science 
.331 and foreign language .371. It is thus 
seen that the average correlations of these 
measures with marks in the specific subject 
fields have the same rank order the second 
semester as the first, except that science and 
foreign language shift ranks at the lower end 
of the series. These findings would seem to 
indicate that there is a real difference in the 
ability of the Ohio State University Psycho- 
logical Examination, high school average 
marks and American Council Test to predict 
marks in the different college freshman fields 
included in this study, when these correla- 
tions are averaged. 

Correlation of first semester college fresh- 
man marks with second semester college fresh- 
man marks (.784) is much higher than any 
of the other correlations in Table I involving 
general scholarship. This is probably due 
chiefly to the fact that most freshmen at Phil- 
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lips University, as is probably true also in 
most small typical liberal arts colleges, con- 
tinue their first semester courses through the 
second semester without change of teachers 
and with little change in class personnel. 
This situation makes the first semester aver- 
age of freshman marks at Phillips University 
a valuable agent for the prediction of second 
semester marks. However, in spite of its ad- 
vantage in respect to general scholarship, it 
is but little superior to Ohio State University 
Psychological Examination or high school av- 
erage marks for the prediction of second 
semester marks in the separate college subject 
fields included in this study, as can be ob- 
served in Table I, rows one, three and four, 
columns eleven to fifteen inclusive. 

The average correlation of high school 
marks in subject fields with college marks in 
the corresponding fields for the first semester 
(.482; average of first column in Table II), 
is the same as that of mental tests and high 
school marks with average college freshman 
marks for the first semester (.482). This 
might seem to lend some weight to the re- 
cently proposed and rapidly developing hy- 
pothesis that given amounts of academic prep- 
aration in specified high school subject fields, 
as a prerequisite to college entrance, are not 
essential to college success (Douglas—3, p. 
283). Caution against such conclusion, how- 
ever, is presented in the fact that the correla- 
tion of high school marks in the separate sub- 
ject fields with second semester college fresh- 
man marks in the same subject fields (.590; 
average of the fourth column in Table II.), 
is nearly .o7 higher than is the average of 
mental tests and high school average with 
second semester average of college marks, 
which is .522, as indicated above. This fact, 
unless accounted for in some way not now ap- 
parent to the writer, would tend to prolong 
the traditional belief in the efficacy of high 
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school preparation in specified subject fields 
to enable the student to achieve college suc- 
cess in those fields. 

From the correlations calculated by a 
varied combination of the measures used in 
the study, the following were selected as prob- 
ably the most efficient for presenting the dif- 
ferent phases of prediction attempted in gen- 
eral scholarship: 


1. The multiple correlation of first semester 
college marks with American Council Test 
scores and high school average marks is .590, 
with a probable error of estimate of + .445; 
and the regression equation is X, = .004X, 
— .566X, + .342. 

2. The multiple correlation of first semes- 
ter college marks with Ohio State University 
Psychological Examination scores and high 
school average marks is .605, with a probable 
error of estimate of + .436; and the regres- 
sion equation is X,==.014X, — .500X, — 
249 

3. The multiple correlation of first semes- 
ter college marks with Ohio State University 
Psychological Examination scores and Ameri- 
can Council Test scores and high school aver- 
age marks and English Aptitude Test scores 
is .621, with a probable error of estimate of 
+ .425; and the regression equation is X, == 
o1X, — .0003X, + .428X, + .0005X,, — 
.040. 

4. The multiple correlation of second se- 
mester college marks with first semester col- 
lege marks and high school average marks is 
829, with a probable error of estimate of 
+ .295; and the regression equation is X, == 
.590X, + .445X, — .270. 

5. The multiple correlation of second se- 
mester college marks with first semester col- 
lege marks and Ohio State University Psy- 
chological Examination scores is .802, with a 
probable error of estimate of + .316; and the 





TABLE II 


SUBJECT-WITH-SUBJECT CORRELATIONS OF HIGH SCHOOL MARKS WITH COLLEGE FRESHMAN 
MARKs, WITH PROBABLE ERRORS AND CORRECTIONS FOR ATTENUATION 


Prob- Corrected Prob- Corrected 

: Corre- able correla- Corre- able correla- 
Subject lation error tion lation error tion 
(First Semester) (Second Semester) 
Ee ae a ce 541 .033 .703 551 .031 735 
Mathematics ...__.________._______ 439 .096 954 804 044 1.340 
EE stn cinincisutiibibtipinbibeal 525 =.041 720 296 =. .061 406 
iy eee 477.051 636 637 046 671 
Soe. GOED o2.......222.2.2ecuce 482 067 786 664 .048 1.210 
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regression equation is X, == .671X, + .008X, 
+ .5§80. 

6. The multiple correlation of second se- 
mester college marks with first semester col- 
lege marks and high school average marks 
and Ohio State University Psychological Ex- 
amination scores and English Aptitude Test 
scores is .836, with a probable error of esti- 
mate of + .292; and the regression equa- 
tion is X, == .516X, + .418X, + .003X, + 
.0005X, — .235. 

7. The correlation of first semester college 
marks with Ohio State University Psychologi- 
cal Examination scores is .522, with a prob- 
able error of + .o41 and a probable error of 
estimate of + .319; and the regression equa- 
tion is X, == .021X, — 1.634. 

8. The correlation of first semester college 
marks with American Council Test scores is 
.408, with a probable error of +.044, and a 


probable error of estimate of + .333; and 
the regression equation is X,==.007X, + 
2.031. 


9. The correlation of first semester college 
marks with high school average marks is .516, 
with a probable error of + .042, and a prob- 
able error of estimate of + .362; and the re- 
gression equation is X, == .697X, + .451. 

10. Correlation of second semester college 
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marks with first semester college marks js 
.784, with a probable error of + .021, and a 
probable error of estimate of + .219; and the 
regression equation is X, = .765X, + .774. 

First semester marks were predicted by the 
regression equations in 1, 2, 3, 7, 8, and o: 
and second semester marks by the regression 
equations in 4, 5, 6, and ro. 

The regression equation was used to predict 
the average of semester college marks for each 
student. These marks were correlated with 
achieved marks, and the coefficient compared 
with the correlation coefficient in the regres- 
sion situation by which the marks were pre- 
dicted. Means and sigmas of predicted marks 
were calculated and compared with those of 
achieved marks. The predicted mark of each 
student was compared with his achieved mark 
and the difference found. The sum of these 
was divided by N to get the average difference 
between predicted marks and achieved marks. 
The range of difference was found by noting 
the lowest difference, as well as the highest, 
between the predicted mark and the achieved 
mark of any student in the study. All these 
calculations were made for each of the re- 
gression equations in situations 1 to 10, in- 
clusive. The results of these operations are 
shown in Table ITI. 


TABLE III 


REGRESSION SITUATIONS, MEANS AND SIGMAS, AVERAGE DEVIATION AND RANGE OF DIFFERENCE, 
AND CORRELATIONS OF PREDICTED AND ACHIEVED MARKS, ALSO THE CORRELATIONS 
OF THE REGRESSION SITUATIONS 


Regres- Diff. in Range of r (predicted 
sion Mean Sigma Grade Pts Diff. & Ach. Marks) Rorr 

a nde 3.19” .81° 

3.15 49 492 0-1.54 593 + .035 590 
D” ‘hianmntbedtacins 3.19 81 

3.18 53 .480 0-2.41 .613 + .034 605 
ae 81 

3.19 60 437 0-1.49 .618 + .034 621 
OT ie etn 3.33 .79 

3.37 .62 341 0-1.76 .834 + .021 829 
I Ey 79 

3.32 67 320 0-2.15 821 = 022 302 
_ Re 3.33 19 

3.37 .66 314 0—1.80 .842 + .020 836 
a 81 

3.17 .63 .520 0-2.67 .538 + .040 522 
ee ee ee 81 

3.18 61 .555 0—2.82 .413 + .044 .408 
OD. -dia nada eicatee 3.19 81 > 

3.14 65 513 0-2.12 .508 + .042 516 

ee ae aan 3.33 .79 os 
3.29 64 .335 0-2.01 .791 + .023 .784 


e These regression situations are described in numbered paragraphs in the text. 
* Figures above refer to achieved marks, and those below refer to predicted marks, in the 


columns of means and sigmas. 














™ 








A brief examination of Table III reveals 
the fact that the means of predicted and 
achieved marks are approximately equal, the 
largest variation between the two being .oso 
of a grade point, which occurs in the case of 
regression situation (9). Sigmas of predicted 
marks are uniformly smaller than those of 
achieved marks, the greatest variation being 
220 of a grade point, which is in the case of 
regression equation (1). The smallest dif- 
ference in sigmas is seen to be .120 of a grade 
point, in the case of regression equation (5). 
Predicted marks were usually lower than 
achieved marks in the upper parts of the dis- 
tribution, whereas in the lower parts the re- 
verse was true. This, of course, accounts for 
the fact that sigma of predicted marks was 
smaller than that of achieved marks. 

The average difference between predicted 
marks and achieved marks ranged from .314 
of a grade point (about one-third of a grade- 
symbol) in the case of correlation (6) to .555 
of a grade point (a little over half of a grade- 
symbol) in the case of regression equation 
(8). The range of difference was greatest in 
equation (7), being 2.67 grade points, and 
least in equation (3) with 1.49 grade points. 
The correlation between predicted marks and 
achieved marks was, in each case, approxi- 
mately the same as the correlation in the re- 
gression situation used as the basis of the 
predicted marks. For example, the correla- 
tion in (1) is .590 as compared with .593 for 
the correlation of predicted marks with 
achieved marks. 

It may be noted that prediction of first 
semester marks by the use of regression equa- 
tion (3) yields a smaller average difference 
from achieved marks (.437 of a grade point) 
than any other correlation used to predict first 
semester marks. When second semester 
achievement marks are placed alongside first 
semester achievement marks, and each mark 
predicted by (3) is compared to the corre- 
sponding semester mark (achieved) that is 
nearest in size to it, the average error of pre- 
diction is cut from .437 of a grade point to 
.356 of a grade point; and when the ten per 
cent most widely variant cases are eliminated, 
this error is cut to .261 of a grade point for 
the remaining ninety per cent of the students 
in the study, with a range of 0 to .96 of a 
grade point. 

When second semester marks are predicted 
from first semester marks, or a multiple corre- 
lation including these marks, the average dif- 
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ference between predicted marks and achieved 
marks is much smaller than it is in the other 
predictions. This is shown in Table III, re- 
gressions 4, 5, 6, and 10. The most efficient 
of these regressions is (6), and it is a five- 
variable equation, just as (3), a five-variable 
correlation, is the most effective combination 
of the measures used in the study for the 
prediction of first semester marks. When 
the ten per cent most widely variant cases 
are eliminated the error of prediction with 
regression equation (6) is cut to .217 of a 
grade point for the remaining ninety per cent 
of the students, with a range of o — .60 of a 
grade point. 

The findings in Table III are true when ap- 
plied to the 1934-1935 Phillips University 
Freshman Class as a whole, but they give no 
hint as to what is happening in the different 
levels of the distribution. 

In order to show how efficiently the marks 
of students of varying abilities can be pre- 
dicted by the use of regression equations (1) 
to (10) a series of ten tables showing the 
decile rank of predicted marks as compared 
with achieved marks was prepared. Tables 
IV and V, with accompanying analyses, are 
here presented as an illustration of how the 
data were treated in these ten tables. Tables 
VI and VII are a summary of findings in the 
ten tables. 

Table IV, presenting predictive results ob- 
tained by use of regression (3) compared with 
achieved marks, is presented to illustrate the 
efficiency of this technique in predicting first 
semester college freshman marks. Table V 
deals similarly with second semester marks. 
Regression equation (2) is chosen for analysis 
because it is the result of the best combina- 
tion of variables discovered in the study for 
the prediction of first semester marks. 

Both predicted and achieved marks were 
ranked individually from highest to lowest. 
These marks were then blocked into deciles 
and placed in the table. The individual stu- 
dents are represented by numerals 1 to 140. 
This was done in order to identify any given 
student in the distribution of achieved marks 
as compared with predicted marks. It also 
enables one to study the relative predictive 
efficiency of the various combinations of 
measures used, as applied to the good stu- 
dent or the poor, to male or female (the un- 
derscored numbers are girls), as well as to 
give a check-up on the relative predictive effi- 
ciency of a given measure or group of meas- 
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TABLE IV 


DectLeE RANKS OF PREDICTED FirsT SEMESTER MARKS OF 140 PHILLIPS UNIVERSITY FRESHMEN, 
1934-1935, BASED UPON THE CORRELATION R (FIRST SEMESTER MARKS) (OHIO STATE Uy). 
VERSITY PSYCHOLOGICAL Test Scorers + HIGH SCHOOL AVERAGE + AMERICAN Councit 
Test Scores + ENGLISH APTITUDE TEST SCORES) = .621, COMPARED WITH CORRESPOND. 


ING RANKS ACHIEVED 


Deciles Predicted Rank Achieved Rank 
5.00- 4, 5, 20, 24, 26, 45,136,139 7, 19, 20, 45, 85, 114, 116 
«a1 .--.------. 137, 27, 58, 68, 87, 88, 94° 136, 137, 5, 6, 68, 87, 88, 139 
4.30- 7, 2, 19, 52, 59, 65, 114 80, 99, 128, 129, 90, 105, 119 
é‘ 
389-116, tH, 88, 90, 119, 1958, 9 8, 88, Zo, 75, 86 
3.82- 13, 25, 60, 53, 79, 140, 129 1, 4, 13, 14, 52, 62, 124, 127 
B57 -------------- 31, 32, 51, 87, 9,70, 105 15, 27, 30, 54, 115, 131 
3.56- 14, 80, 99, 106, 96,126,134 2, 25, 26, 59, 67, 74, 83, 113 
ee icine elapsed Sedat dati dale 10 
3.37- 1, 21, 64, 74, 101, 124, 127 24, 40, 53, 64, 106, 107, 110 
(a 22, 44, 28, 61, 77, 86 18, 31, 44, 57, 104, 120, 135 
3.12- 34, 39, 92, 85, 102, 83, 15,18 36, 65, 79, 93, 126, 11, 17 
296 28 48, 180, ds, 2,04, roo 29,9, 8, 6, a, fa 
2.93- 29, 108, 112, 121, 130, 67, 43 3, 35, 92, 96, 101, 108, 121 
279 ne 8H, 1B, 78, $7,115, 31, 16 190, 140, 43, 6, 69,81, 138 
2.78- 35, 36, 41, 46, 62, 113, 122 21, 29, 34, 39, 41, 100, 122 
a, 17, 50, 63, 76, 81, 98, 111 134, 16, 28, 61, 98, 111, 118 
2.49- 49, 93, 95, 107, 128, 84,133 12, 42, 46, 71, 82, 102, 125 
AT oc 88, 69, 91, 178, 192, 138, 42 50, 1, 97, 103, 133, 132 
2.16- 3, 8, 10, 12, 40, 47, 56, 71 8, 10, 47, 49, 56, 60, 95 
ea oe. 100, 125, 89, 103, 123, 82 112, 133, 48, 73, 76 


“Superscript shows rank of student in other distribution. 


ures, in the different deciles of the distribu- 
tion. The numeral just above the number 
representing each student in the distribution 
of predicted marks shows in what decile that 
student may be found in the distribution of 
achieved marks. For example, student num- 
ber 4, in the upper left corner of decile ten of 
predicted marks is found in decile eight of 


achieved marks. Students 5 and 20 in decile 
ten of predicted marks are also in decile ten 
of achieved marks, and therefore carry no 
numbers indicating a shift to a different decile 
in the other distribution. 

A brief inspection of these superscript num- 
bers reveals the fact that only one student 
predicted to be in the tenth decile, that is, be- 
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tween 5.00 and 4.31 grade points, fell below 
the sixth decile, or 3.37 to 3.13 grade points. 
Of the fifty-six students predicted to be in 
deciles seven to ten of the distribution, which 
just about coincides with the official range of 
S (A) and G (B) marks at Phillips Univer- 
sity, forty-two are found to have achieved 
rank in those four deciles. Prediction, there- 
fore, is about seventy-six per cent correct in 
this section of the distribution. Of the forty- 
two students predicted to be in deciles four, 
five and six, the area of the distribution 
roughly coincident with the mark M (C) at 
Phillips University, only sixteen are found to 
be in those deciles in the achievement distri- 
bution. The prediction is only about thirty- 
eight per cent correct in this section of the 
distribution. Regression (3) evidently has 
low practical predictive value in this area of 
the distribution. Of the forty-two students 
predicted to be in deciles one, two, and three, 
which coincide with the area of the distribu- 
tion to which the marks of I (D) and F are 
assigned at Phillips University, twenty-four 
are found to be in those deciles. Prediction 
in this section of the distribution is about 
fifty-seven per cent correct. This is better 
than in the central portion of the distribution, 
but not se good as in the upper portion. 

Another grouping of deciles in the distribu- 
tion is interesting, as well as useful, in the 
analysis of the predictive efficiency of the dif- 
ferent correlations used in the study. Phil- 
lips University requires an average mark of 
M or 1.00 (transformed to 3.00 for the pur- 
pose of this investigation) for graduation. 
This may be termed the success mark in the 
school. It may be observed that this mark 
falis in the fifth decile of the distribution of 
first semester marks on Table IV. Of the 
fifty-six students predicted to fall below this 
mark, that is, in deciles one to four, thirty- 
nine of them are found to do so. So predic- 
tion in this area is about seventy per cent cor- 
rect; that is, regression equation (3) can be 
used to predict about seven out of ten fail- 
ures at Phillips University, according to this 
finding. 

The superscript numbers may be used for 
a still more detailed analysis of prediction in 
the different deciles of the distribution. For 
example, in Table IV, decile ten, it may be 
observed that student 4 has a decile displace- 
ment of 2; student 24 has a decile displace- 
ment of 4; student 26, 3; student 27, 2; stu- 
dent 58, 1; and student 94, 3. This is a 
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total decile displacement of 15 for the fifteen 
students in decile ten, or an average decile 
displacement of 1. But the fifth decile has a 
total decile displacement of 31, or an average 
decile displacement of more than 2. 

A word might be said in explanation of why 
prediction is better in the upper and lower 
parts of the decile tables than in the central 
part. 

This situation has been understood and dis- 
cussed by statisticians, as applied to the nor- 
mal curve. It has been described as sigma 
difficulty. At the upper end of the curve 
sigma difficulty is so great that only a few 
reach these ranks, and these are scattered over 
relatively wide stretches of the base line of the 
curve. At the lower end of the curve sigma 
difficulty is so small that most students rise 
to higher positions, leaving only a scattered 
few in that portion of the curve, and they also 
appear at relatively wide intervals on the base 
line of the curve. Hence, when a distribu- 
tion curve is thrown into a decile ranking, it 
is necessary to stretch the upper and lower 
deciles over relatively wide areas of the curve 
in order to include the necessary tenths of 
students in them. 

It may be observed in Table IV that decile 
ten covers an area of .69 grade points (5.00 
to 4.31) in the distribution and that deciles 
seven to ten cover 1.62 grade points, or an 
average of .405 grade points to the decile, 
whereas deciles four, five and six cover only 
.58 grade points, or an average of only .19 
grade points per decile in this section. Decile 
prediction, therefore, should be about twice 
as accurate in the upper portion of the dis- 
tribution as in the central area, which was 
found to be so in the compared analyses of 
deciles ten and five above. This same situa- 
tion prevails@though not to such a marked 
degree, in the lower area of the distribution 
as compared with the central portion, and 
accounts for the increased proficiency of pre- 
diction there. 

Another observation that may be made on 
Table IV is that no student predicted to be 
in deciles one to four achieved the tenth 
decile, and only one achieved as high as the 
ninth. Also, no student in the ninth and 
tenth deciles fell lower than the fifth. A stu- 
dent predicted to be in the tenth decile had 
1 in 1.67 chances of achieving the tenth 
decile; whereas a student predicted to be in 
the lower half of the distribution had only 
one in seventy-two chances of achieving the 
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tenth decile. So a student predicted to be in 
the tenth decile is forty-three times as apt to 
achieve that rank as is a student predicted to 
be in deciles one to five inclusive. 


Regression (3), Table IV, predicts first 
semester-college freshman marks for boys 
somewhat better than it does for girls. There 
are seventy-five boys and sixty-five girls in 
the distribution. The decile placement of 
eighteen boys, or twenty-four per cent, is 
exactly predicted, whereas the decile rank of 
twelve girls, or nineteen per cent, is exactly 
predicted. Forty-five per cent of the girls 
exceed the decile rank predicted for them, 
and forty-four per cent of the boys exceed it. 
Thirty-two per cent of the boys fall below the 
decile rank predicted and thirty-six per cent 
of the girls fail to reach it. Of the thirty- 
two boys predicted to be in deciles one to four 
inclusive (the area below the success mark at 
Phillips University), twenty-three or seventy- 
two per cent are found to have achieved that 
rank; and of twenty-four girls predicted to be 
in those four deciles, fifteen or sixty-three per 
cent are found to have achieved that rank. 
Of the twenty-nine boys predicted to be in 
deciles seven to ten (the area of S (A) and 
G (B) marks at Phillips University) twenty- 
one or seventy per cent were found to be 
there; and of the twenty-six girls predicted 
to be in those four deciles, twenty-one or 
eighty-one per cent were found to be there. 
It thus appears that regression equation (3) 
predicts the college freshman success of boys 
more accurately than that of girls in exact 
decile prediction throughout the distribution 
as well as in the degree of accuracy in pre- 
diction in deciles one to four. Prediction, 
however, is more accurate for girls in deciles 
seven to ten. The last statement is not true 
when limited to decile ten, in which prediction 
for boys is more accurate than for girls. This 
tendency towards more efficient prediction for 
boys at both ends of the distribution, based 
on regression equation 3, is also true, in gen- 
eral, of the other regressions used for predic- 
tion in this study. 


Regression equations 1, 2, 7, 8, and 9, in 
which first semester marks are also predicted, 
might be analyzed and presented as regression 
equation 3 has been in Table IV, but space 
will not be taken for such detailed treatment 
here. Similarities to, and differences from, 
the findings in Table IV as treated above, may 
be noted in Tables VI and VII, which are a 
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summary of findings in regression equations 
1 to 10 inclusive, treated as in Table IV. 
Students’ predicted rankings by the use oj 
regression equations 7, 8, and 9, which are 
based upon simple correlations, are much mor, 
varied than they are when derived from re. 
gression equations 1, 2, and 3, which ar 
based upon multiple correlations. For exam. 
ple, the rankings of student 119 by regres. 
sions equations 7, 8, and g are deciles five. 
nine and ten, a variation of five deciles 
whereas the rankings of student rro in re. 
gression equations 1, 2, and 3, which are 
based on multiple correlations, are nine, nine 
and ten, a variation of only one decile. For 
student 22 the corresponding rankings are 
four, five and eight, a variation of four deciles, 
as compared with five, six, and six, a variation 
of one decile. For student 30 the correspond- 
ing rankings are three, five and nine, a varia- 
tion of six deciles, as compared with seven, 
seven and seven, a variation of no deciles 
These are random choices and illustrate th: 
point under discussion. This finding illus. 
trates how inadequate may be the practice oj 
sectioning or ranking entering college fresh- 
men on the results of a single measure. 
Table V is a presentation of predicted and 
achieved second semester marks in decile dis- 
tribution. This table is chosen for analysis 
because it is based upon a five-variable corre- 
lation, and is the most efficient of the correla- 
tions used for the prediction of second semes- 
ter marks. A brief study of Table V will be 
sufficient to demonstrate that regression equa- 
tion 6, on which the table is based, is a more 
efficient agent for the prediction of second 
semester marks than any of those used to pre- 
dict first semester marks. For example, exact 
decile prediction is twenty-nine per cent in 
Table V, as compared with twenty-four per 
cent for regression equation (3) on which 
Table IV is based, and this is the most effi- 
cient combination for predicting first semes- 
ter marks, as was seen above. The balance 
is also better between the per cent of students 
exceeding or falling below the predicted mark, 
being thirty-seven and thirty-two per cent 
respectively for Table V, as compared with 
forty-five and thirty-four for Table IV. The 
superiority of Table V, however, is slight in 
predicting the marks of abler students, the 
per cent of correct prediction in deciles seven 
to ten being seventy-eight for Table V as 
compared with seventy-six for Table IV. The 
superiority of Table V is apparent in the cen- 
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TABLE V 


pectiLe RANKS OF PREDICTED SECOND SEMESTER MARKS OF 126 PHILLIPS UNIVERSITY FRESHMEN, 
1934-1935, BASED UPON THE CORRELATION R (SECOND SEMESTER MARKS) (First SEMESTER 
Marks, HIGH SCHOOL AVERAGE, OHIO STATE UNIVERSITY PSYCHOLOGICAL TEST SCORES, AND 
ENGLISH PLACEMENT TEST SCORES) = .836, COMPARED WITH CORRESPONDING DECILE RANKS 


ACHIEVED 
Dec iles 


5 OO 


10 


3.26 


}.06- 


“Superscript shows rank of student in other distribution. 


tral areas, and marked in deciles one to four, 


Predicted Ranks 
136, 45, 114, 137, 20, 7 
116, 19, 88, 189, 68, 87," 119 
4, 80, 52, 129, 6, 58 


10 10 


90, 32, 27, 75, 70, 9 





85, 13, 24, 99, 23, 105 


30, 33, 55, 11, 117, 135 








6 


26, 1, 59, 127, 25, 2, 94 
109, 51, 86, 54, 15, 44 
128, 106, 126, 65, 101, 53 


31, 43, 131, 18, 115, 120 








110, 14, 83, 124, 92, 113 


96, 64, 62, 104, 22, 63 


~~ 


36, 100, 102, 134, 66, 89 


1, 28, 61, 78, 84, 38 
138, 140, 21, 107, 35, 121 
40, 122, 5, 17, 97, 37, 76, 69 


41, 130, 39, 46, 3, 50, 81 


16, 73, 111, 118, 132 


125, 71, 56, 42, 133, 12, 82 


49, 95, 10, 8, 123 


Achieved Ranks 


a 


136, 114, 1945, 7, 5, 20 


a ’ 


68, 139, 32, 90, 88, 119 


‘ 


1 . “ ’ 
137, 85, 116, 129, 1, 13, 70 


6, 9, 27, 33, 54, 55, 58, 87 
® 8 3 5 
52, 62, 99, 110, 124, 26, 127 


8 ‘ 


, 89 


131, 105, 15, 51 


67, 83, 128, 4, 91, 120, 23 


t oI 


30, 75, 18, 109, 104 

59, 2, 46, 25, 40, 53 

24, 84, 44, 63, 31 

80, 134, 101, 108, 3, 113, 106 


a s 4 4 e ‘ 
135, 117, 78, 69, 115, 11 


14, 121, 122, 107, 35, 132 


86, 43, 37, 66, 22, 138, 16 


‘ 


96, 64, 65, 125, 126, 28 


‘4 


73, 81, 17, 76, 61, 38 


21, 140, 100, 71, 42, 36, 102 
92, 123, 50, 111, 97 

2 2 

39, 41, 8, 95, 10, 49, 56 


133, 12, 83, 130, 94, 118 


dicted to be there are found there except four, 


as may be observed on Table VII. In Table and none of them rise above the third decile. 


\ all students predicted to be in decile ten are 
found to be there except three, and all three 
of these are in decile nine. 
of exact decile prediction is also to be ob- 
served in decile one, where all students pre- 


This high degree 


A student predicted to be in the tenth decile 
has 1 in 1.3 chances of being there; whereas a 
student predicted to be in deciles one to eight 
inclusive has no chance of being in decile ten. 
No student predicted to be in the lower half 
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of the distribution rises above the eighth decile _ ties of the distribution (deciles 7-10 and ; 
in achievement. with nearly as much efficiency as first semes. 


It may also be observed on Table VII that ter marks predict second semester mar! 


regression equation (3) with five variables those deciles. Since these are the areas 
predicts first semester marks in the extremi- superior scholarship and failure, regressio; 


TABLE VI 


NUMBER AND PER CENT OF PHILLIPS UNIVERSITY FRESHMEN, 1934-1935, WHOSE ACHIEV: 
MARKS EQUALLED, EXCEEDED, OR FELL BELOW THEIR PREDICTED MARKS, BY DECILES 


Regre Equalled Exceeded Fell Bel 
nio1 Boys Girls All Boys Girls All Boys Girls 
1 21 9 26 28 29 57 27 0) 

29 14 19 37 45 41 36 16 
9 eee 16 5 21 30 34 64 29 26 
21 8 15 40 52 46 30 10 
B we 18 12 30 33 29 62 24 24 
24 19 21 44 45 45 oo 36 
{ 23 16 37 24 22 46 20 21 
34 27 29 26 38 37 30 26 
21 11 32 18 ol 49 28 15 
ol 20 26 27 54 40 12 Zt 
6 i6 20 36 21 26 16 ov 13 
24 34 29 31 44 ot 45 22 
r 17 11 28 23 37 60 35 17 
23 17 20 31 57 43 47 26 
Q 7 13 13 26 24 26 50 38 15 
17 20 19 32 55 43 47 24 
Q 21 10 3 31 23 54 23 o2 
28 15 22 42 35 39 31 50 
10 ar 29 20 49 17 26 43 21 13 
43 34 39 25 44 34 32 22 


‘The upper row in each double row of figures indicates number of students and ths 
row shows per cent 


TABLE VII 


VPA 


NUMBER AND PER CENT OF PHILLIPS UNIVERSITY FRESHMEN, 1934-1935, WHOSE ACHIEVEMENT 


MARKS EQUALLED THEIR PREDICTED MARKS IN SPECIFIED 
AREAS OF THE DECILE DISTRIBUTION 


Regres Deciles 7-10 Deciles 4-6 Deciles 1-3 Deciles 
sio! B G All B G All B G All B G 
l — 20 20 40 8 7 15 13 6 20 25 12 
: i, 1 71 £0 30 25 50 40 46 74 54 

2 eh 18 36 7 9 16 17 6 23 25 11 
i) i3 i] o7 40 su 66 37 55 78 46 

3 cat. Sa 21 2 6 10 16 16 8 24 23 15 
70 81 76 32 43 38 61 50 57 72 63 

1 17 24 41 Pal 11 18 18 8 26 25 16 
RO 80 80 44 58 49 67 73 69 7 84 

5 20 21 41 7 10 17 15 9 24 22 17 
77 88 82 37 50 44 68 60 65 79 78 

6 - 18 21 39 8 8 16 7 6 13 23 18 
78 78 78 40 7 42 66 66 66 R5 78 

7 24 13 37 6 8) 15 11 9 20 21 17 
66 76 69 33 35 34 61 41 50 81 57 

8 ae 21 16 37 5 6 11 11 8 19 16 16 
62 71 66 22 32 26 61 33 45 64 48 

9 : 21 18 39 7 12 19 15 6 21 29 i) 
81 64 72 35 50 43 50 50 50 73 53 

a gs 22 42 9 9 18 15 11 26 24 14 
83 81 2 45 53 49 68 68 68 80 67 


“The upper row in each double row of figures indicates number of students and the 
row shows per cent. 
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equation (3) could probably be used almost 
as effectively for guidance of entering college 
freshmen as could first semester marks for 
guidance of students into work for the second 
semester. 

On Tables VI and VII it may be observed 
that high school average marks are a little 
superior to mental tests in predictive effi- 
ciency, in spite of the fact that Ohio State 
University Psychological Examination scores 
correlate somewhat higher with first semester 
marks. American Council Test scores are de- 
cidedly inferior, in predictive efficiency, to the 
other two, especially when limited to specified 
areas of the distribution. Exact prediction 
of decile placement is twenty-two per cent for 
high school average, twenty for the Ohio test, 
and nineteen for the American Council Test. 
The balance between the percentage of stu- 
dents exceeding or falling below predicted 
marks when applied to the student body as a 
whole is somewhat better for high school aver- 
age, the percentages being thirty-nine and 
forty respectively for high school average, 
forty-three and thirty-seven for the Ohio State 
University Psychological Examination, and 
forty-three and thirty-nine for the American 
Council Test. However, when analyzed sep- 
arately for boys and girls, a different picture 
is presented. High school average is the 
poorest of the three measures in predicting 
exact decile placement for girls, the predic- 
tion for high school average being fifteen per 
cent, compared with twenty for American 
Council Test and seventeen for Ohio State 
University Psychological Examination. The 
balance between the per cent of students ex- 
ceeding and those falling below predicted 
marks is quite different for the sexes taken 
separately from what it is for the student 
body as a whole. The boys, in much larger 
proportions than girls, fail to measure up to 
the marks predicted by the mental tests, forty- 
seven per cent of them falling below predicted 
marks in both mental tests, as compared with 
thirty-one and thirty-two per cent exceeding 
predicted marks. The reverse is true of girls, 
fifty-five and fifty-seven per cent exceeding 
marks predicted by mental tests, and twenty- 
four and twenty-six per cent failing to reach 
them. In predictions based upon high school 
average marks the position of the sexes is re- 
versed from that in the mental tests, forty-one 
per cent of boys exceeding predicted marks 
and thirty-one per cent falling below, where- 
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as only thirty-five per cent of girls exceed pre- 
dicted marks and fifty per cent fall below. 
Table VI shows that high school average 
marks are also a little superior to the mental 
tests in predicting the marks of the abler stu- 
dents, when attention is centered on the stu- 
dent body as a whole, prediction by high 
school average being seventy-two per cent cor- 
rect compared with sixty-six per cent correct 
for the American Council Test and sixty-nine 
for the Ohio State University Psychological 
Examination. Superiority for high school av- 
erage marks in the middle deciles is decided, 
the prediction being forty-three per cent cor- 
rect for high school average, as compared with 
twenty-six for American Council Test and 
thirty-four for the Ohio State University Psy- 
chological Examination. In deciles one to 
four high school average marks are superior 
to American Council Test scores, but slightly 
inferior to Ohio State University Psychologi- 
cal Examination scores, the prediction being 
sixty-seven per cent for high school average, 
fifty-seven for American Council Test, and 
sixty-eight for the Ohio State University Psy- 
chological Examination. However, when the 
sexes are considered separately a quite differ- 
ent picture is presented again. Both mental 
tests predict the marks of the abler girls 
(deciles 7-10) better than does the high 
school average, the predictions being seventy- 
one and seventy-six per cent correct respec- 
tively for the American Council Test and the 
Ohio State University Psychological Exam- 
ination, as compared with sixty-four per cent 
for high school average. But the reverse is 
true for abler boys, the prediction being 
eighty-one per cent correct for high school 
average, and sixty-two and sixty-six respec- 
tively for American Council Test and Ohio 
State University Psychological Examination. 
In deciles one to four the mental tests pre- 
dicted the marks of boys better than those of 
girls, the prediction being sixty-four and 
eighty-one per cent respectively for the Amer- 
ican Council Test and the Ohio State Univer- 
sity Psychological Examination, and forty- 
eight and fifty-seven for girls. About the 
only statement one can make with confidence 
about the comparative predictive efficiency of 
high school average marks, American Council 
Test scores, and the Ohio State University 
Psychological Examination scores is that none 
is best for all levels of ability and for both 
sexes. 
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Some general statements may be made from 
observations on Tables VI and VII. First, 
with two exceptions, the exact prediction of 
decile placement is better for boys than for 
girls, the average percentage of exact decile 
prediction for boys being 27.2 and for girls 
20.3. Second, girls, with but one exception, 
in larger proportions than do boys, tend to 
exceed their predicted marks, the average per 
cent of girls exceeding predictions being 46.9, 
and that of boys 33.4; and conversely, boys 
in larger proportions than girls tend to fall be- 
low their predicted marks, the average per 
cent who fall below being 38.1 for boys and 
32.8 for girls. Third (on Table VII), on the 
whole, the measures used predict the marks of 
abler girls somewhat better than they do those 
of abler boys (deciles 7-10), the average per 
cent of correct prediction for girls being 76.8 
and that for boys 73.3. Fourth (on Table 
VII), the measures used predict the marks of 
the average girl better, with one exception, 
than they do those of the average boy (deciles 
4-6), the average per cent of correct predic- 
tion being 43.8 for girls and 36.5 for boys. 
Fifth (on Table VII), in the lower levels of 
the distribution, especially in the grouping of 
deciles 1-4, prediction for boys is better, with 
one exception, than it is for girls, the average 
per cent of correct prediction being 76.4 for 
boys and 62.8 for girls. Sixth (on Table 
VIT), with two slight exceptions, the measures 
used predict marks of students in deciles 
seven to ten with a degree of accuracy above 
seventy per cent, when applied to the student 
body as a whole and that this prediction is 
not far from constant, the average prediction 
in this area being 74.7 per cent correct. This 
is noteworthy, since the ten correlations used 
as the basis of these predictions ranged from 
.408 to .836.. Seventh (on Table VII), no 
correlation used predicted as many as fifty 
per cent correct in deciles four to six inclu- 
sive, when applied to the student body as a 
whole, the average prediction in this area be- 
ing 39.9 per cent correct. Eighth (on Table 
VII), for the ten regressions used, the aver- 
age prediction in deciles one to four is 70.6 per 
cent correct when applied to the student body 
as a whole. This is not far from the effi- 
ciency of prediction (74.7 per cent correct) 
noted in deciles seven to ten. And so the 
statement may be repeated that prediction is 
far more efficient in the upper and lower areas 
of the distribution, and so low as to be of 
little apparent use in the central areas. 
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The Critical Point in the Prediction of Colles, 
Marks 
Phillips University requires an averag; 
mark of M (C), designated (3) for the pur- 
poses of this investigation, as indicated aboy. 
for graduation. This may be called the success 
mark, and may be used as a criterion of syc- 
cess in the school. The question then arises 
what mark should a student obtain the firs; 
semester of the freshman year in order to have 
as many as 50 chances in 100 of achieving 
this necessary average for graduation at the 
close of his senior year? When this mark js 
known in a school, it may be termed th 
critical point in the marking system of that 
institution. 


In an effort to determine the critical point 
at Phillips University two investigations wer: 
made. First, the average was found of al! 
marks given in freshman courses for the first 
semester, 1934-1935; this was found to be 
3.46. The average mark for all advanced 
courses for the same semester was found to by 
3.42. The average mark in freshman courses 
for the second semester was 3.43, while that 
in advanced courses was 3.48. It thus ap- 
pears, in general, that no higher marks ar 
given in advanced courses at Phillips Univer- 
sity than are given in freshman courses; and 
that if a student hopes to achieve the re- 
quired average of 3.00 for graduation, hi 
must obtain about that average in the first 
semester of his freshman year. 


Second, the complete records of the marks 
of bachelor of arts graduates for the years 
1932, 1933, and 1934 were studied. The 
average mark for the first semester was com- 
pared with the average mark for all other 
courses taken before graduation, for each stu- 
dent. Table VIII shows the results of this 
investigation. 


Table VIII seems to warrant the statement 
that marks received by bachelor of arts grad- 
uates at Phillips University for the first se- 
mester of the freshman year are approximately 
equal to those obtained in later courses pur- 
sued. So we may conclude again that if a 
student hopes to obtain the average mark of 
3.00 required for graduation at Phillips Uni- 
versity, he must achieve approximately that 
mark for the first semester of his freshman 
year. Hence the critical point at Phillips 
University may be said to coincide approxi- 
mately with the success mark. 
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TABLE VIII 


AVERAGE MARKS OF BACHELOR OF ARTS GRAD- 
UATES FOR THE YEARS 1932, 1933, AND 1934, 
COMPARING AVERAGE MARKS FOR THE FIRST 
SEMESTER OF THE FRESHMAN YEAR WITH 
rHe AVERAGE OF LATER MARKS OBTAINED 


First Average of 
Number of Semester Later Marks 
Students Average Obtained 
68 3.68 3.67 
57 3.69 3.75 
47 3.58 3.56 


‘otal 172 3.65 3.66 
[his conclusion is confirmed, at least in 
art, by the fact that of the forty-two stu- 
jents in this study who failed to reach the 
required average of 3.00 for the first semester, 
ind who continued their work for the second 
mester, only nine achieved the mark. The 


} 


rks of twenty-two of them were lower for 


the second semester than for the first, while 
those of twenty were higher. Only one stu- 
dent whose average mark was below 2.46 for 
the first semester reached the success mark 
for the second semester. Of the sixteen stu- 
dents of the study who withdrew from school 
at the end of the first semester, thirteen had 
failed to reach the success mark. 

On the assumption, then, that 3.00 is the 
critical point, as well as the success level at 
Phillips University, what are the chances that 
a student with a given mark, predicted by any 
of the regression equations used in this study, 
will achieve the success mark, and so will 
probably graduate from the school? The 
answer to this question, so far as the data and 
techniques used in this study can answer it, 
is presented in connection with regression 
equation 2 of this study, in which the multi- 
ple correlation of first semester average of 


TABLE IX 


ES OF FAILURE AND SUCCESS FOR MARKS 


X.= .014X, + .500X: + 
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PE Per Cent Per Cent 
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freshman marks with Ohio State University 
Psychological Examination scores and high 
school average marks is .605, with a probable 
error of estimate of + .436 of a grade point. 
The regression equation is X,==.014X, + 
.500X, + .249. Any mark obtained by the 
use of the equation must be evaluated in 
terms of the probable error of estimate. Take 
for example, student 136, whose predicted 
mark is 4.49. What are his chances of 
achieving the success mark of 3.00? His pre- 
dicted mark is 1.49 grade points above the 
required mark. Dividing this 1.49 by the 
probable error of estimate, .436, the quotient 
is 3.4. This means that the predicted mark 
of 4.49 is 3.4 probable errors above the criti- 
cal point of 3.00. This, based upon the nor- 
mal curve, gives student 136 about ninety- 
nine chances (98.9) in one hundred of achiev- 
ing the success mark. Student 46, who has 
a predicted mark of 3.00 has the same chance 


to succeed as he has to fail, that is, fifty 
chances in one hundred, either way. 

In order to illustrate more fully the plac 
of the critical point in the prediction of co}. 
lege marks, Table IX, based upon the norma! 
curve and derived from regression equation 2. 
given above, was prepared. For convenienc 
in computation, probable error ratings ar 
given in tenths. Beginning at the critica! 
point, 3.00, which has a zero probable error 
rating, one-tenth of a probable error, .o43 
of a grade point, is added to, or subtracte 
from, the predicted score as the probabk 
error varies in tenths up or down the sca 


alt 


This technique is adopted from Segel (11, p 


t 
55). Now the chances that any student wit! 
a given mark predicted by regression equatio 
2 will achieve the success mark of 3.00 may by 


read directly from the table. Take student 


one, for example, with a predicted mark of 


3.43. From Table IX, his chances for suc- 


TABLE X 


STUDENTS WHOSE MARKS REGRESSION EQUATION Two (X, = .014X, + .500X, + .249) Pr 
TO BE BELOW THE CRITICAL MARK, THEIR CHANCES OF ACHIEVING SUCCESS, 


REDICTE! 


AND THEIR ACHIEVED MARKS 


Predicted Achieved % Suc- 


Student Mark Mark cessful 
OP sconcnon Se 2.47 49 
—— Ee | CF 2.74 49 
TO. wecensees S58 2.13 48 
== 2.95 3.14 47 
_ 2.95 3.00 47 
ae 2.95 2.00 47 
— 2.94 2.46 46 
aaa 2.92 2.79 45 
78 ae 2.90 3.00 44 
TED ccnitiararssenubo 2.90 2.33 44 
— 3.00 43 
a 2.89 2.91 43 
ae 2.88 3.00 42 
OF immocccus Bae 2.57 42 
BED scccncacs BSS 3.64 42 
i 2.85 3.67 41 
Pee 2.84 1.80 40 
SN inst secte race 2.83 1.50 39 
a iniiaeat 2.82 2.83 39 
————a 2.13 38 
Ree 2.79 2.50 37 
nts ate 3.00 37 
_ eee 2.76 3.77 35 
. (ee 2.75 2.92 35 
| ae 2.72 2.92 34 
a 2.72 3.44 34 
BD wancecwes 2.70 3.00 32 


“Underscored numbers refer to girls. 


Predicted Achieved “ Suc- 








Student Mark Mark cessful 

rn 2.68 2.57 31 
NE seh cake cal 2.68 1.83 31 
————— 2.68 3.06 31 
TE anneccons 2.67 2.87 31 
ae 2.67 4.07 31 
ae 2.66 2.70 30 
en 2.64 3.43 29 
_——_————a 2.60 3.44 27 
ge 2.61 3.19 27 
BAL anesnenee 2.61 2.63 27 
eee 2.58 2.67 26 
BD sencenecs 2.57 3.38 25 
eae. < 2.56 2.19 25 
OD encowenne 2.5% 2.85 23 
eee 2.52 2.18 23 
Wn Satins eae 2.52 1.70 23 
| ae 2.52 3.13 23 
EE 2.51 2.23 22 
. (PP Sererere 2.50 2.20 22 
SRE 2.45 2.46 20 
ee 2.44 1.25 20 
eee 2.42 1.00 19 
_ eee 2.42 2.27 19 
| 2.38 1.67 18 
OE ccestiieneccetintatle 2.35 2.47 16 
ee 2.28 2.81 15 
Sas 2.07 1.00 8 
ee 2.03 2.31 
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ess are observed to be seventy-five in one predicted by regression equation 2 (X, 
hundred, and his chances for failure twenty- .o14X, .500X, + .249) and _ interpreted 
five in one hundred. Student three, with a_ with reference to the critical point by the use 
predicted mark of 2.28, has fourteen chances of Table IX. It may be observed from Table 
ne hundred of achieving the success mark, X that the chances of reaching the success 
and eighty-six chances in one hundred of fail- mark, for students predicted to fall below the 
g to do so. critical point, range from forty-nine in one 
lables X and XI present a complete list of hundred down to seven in one hundred. 





the students of this study with their marks Although, theoretically, all 


students in the 


TABLE XI 


DENTS WHOSE MARKS REGRESSION EQUATION Two (X, 014X, + .500X 
TO BE ABOVE THE CRITICAL POINT, THEIR CHANCES OF FAILING, 
AND THEIR ACHIEVED MARKS 


Predicted Achieved % 
Mark Mark Fai 
4.49 5.00 

_ 4.26 4.37 
4.25 . 4.36 
4.24 4.50 
4.20 4.63 
4.09 3.14 
4.05 4.31 
4.03 3.71 
4.01 4.80 
3.96 4.75 
3.94 3.79 
3.91 3.57 
3.91 4.00 
3.90 3.38 
3.90 3.50 
3.86 4.77 
4.53 

4.80 


.249) PREDICTED 


Predicted Achieved 

ing Student Mark Mark 
oe . oa 3.50 
i 3.49 
101 _- yee 3.49 
a a 3.48 
117 aes ee 
30 - _. 3.45 
1 : 3.43 
105 3.43 
134 42 
55 .- 3 
106 08 
21 .oe 
44 .. Ry 

70 


9° 


ye 
fa 


96 

4.48 120 26 
3.19 15 24 
3.83 127 3.21 


at 
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61 3.20 


—A 


Failing 


> W-1W 9 


or oO bo 


S Oro mm Co tO CO 


0000 0 M11 AS 
Aw OAR Oo- +) 
worm on oof 


a ow www Cte 
— 
we 


— i 


ra 


9 
o 


31 


a 


— 
~ + 
= 


tow k& CONS Wh > CO we Co om Po oO ¢ 
“aK Off & 


— et pt 
CD CD 
oo co CO GO 

90 ~2 hy ~1 00 


uw 
—, 


£0 
3.38 
4.00 aa 
dé 
4.12 
3.56 
3.38 
3.00 
4.25 
3.73 
4.61 
4.00 
3.43 
4.25 
3.31 
3.05 
3.06 19 
4.13 
3.13 21 
3.31 21 
1.00 21 
3.50 2.94 22 
3.50 4.36 22 


rw to ww wort 
nn 
> 


no 
9 oI -)-I1+ 
-— ~~" C 


a) 
~ 
i) 


to 
~ 
ws) 


“Underscored numbers refer to girls. 











distribution have some chance of achieving 
the success mark, it may be seen from the 
column of achieved marks that no student 
with a predicted mark below 2.52 succeeded 
in doing so. Prediction with reference to the 
critical point in the lower part of the distri- 
bution when applied to both sexes together is 
sixty-seven per cent correct, thirty-seven of 
the fifty-five students predicted to fall below 
the mark of 3.00 being found to do so. When 
applied to boys alone prediction in this area 
is eighty-two per cent correct, twenty-two of 
the twenty-eight boys predicted to fall there 
being found to do so. When applied to girls 
alone on this level, prediction is only fifty- 
two per cent correct, fourteen out of twenty- 
seven predicted to be in this area being found 
to be there. 

It may be seen from Table XI that predic- 
tion above the critical point, when applied to 
both sexes together, is much more accurate 
than was the case below the critical point. 
Eighty-five students were predicted to exceed 
the critical point, and sixty-nine, or eighty-one 
per cent of them, were found to do so. Pre- 
diction here is also more nearly equal for the 
sexes considered separately, being seventy- 
nine per cent correct for boys and eighty-two 
per cent correct for girls. Chances in one hun- 
dred for failure to be above the critical point, 
as predicted in this area, are from one up to 
fifty. Theoretically, every student of the 
distribution has some chance of falling below 
the critical point, but it may be observed that 
no one with a predicted mark above 3.52 falls 
below that point; just as no student predicted 
to fall below mark 2.52 rose above it. Pre- 
diction with reference to the critical point is, 
therefore, perfect in the distribution above 
mark 3.52 and below mark 2.52, that is, above 
probable error rating + 1.2 and below prob- 
able error rating — 1.1. That is to say, the 
error in prediction by the use of this tech- 
nique is almost entirely confined to the area 

1 probable error around the critical point. 

When marks predicted by another regres- 
sion equation are to be evaluated by this tech- 
nique another table based upon the probable 
error of estimate of that equation must be 
constructed, or the chances for success of each 
student’s mark must be figured separately, as 
was done in the case of student 136, above. 
All parts of Table X however, except the first 
column, would remain constant for all the re- 
gression equations used in the study. Pre- 
dicted marks in the first column, correspond- 


g JOURNAL OF EXPERIMENTAL EDUCATION 





(Vol. 6, \ 


ing to the probable error ratings, would var, 
according to the size of the probable error , 
estimate of the regression equation for which 
the table was being constructed. In a table 
prepared for use with regression equation six 
for example, whose probable error of estimate 
is .294 of a grade point, the predicted mar! 
corresponding to a probable error rating 
1.0 on the table would become 3.29, instead 
3.44, aS appears in Table IX; whereas 
score of 3.44 on such a table would have 
probable error rating 2.6 above the critical 
point of 3.00, and the student would hay 
ninety-six chances in one hundred of achie\ 
ing the success mark, instead of seventy-fiy: 
chances in one hundred, as in Table IX. 


Differential Prediction of College Mark 


The question of the student’s relative suc- 
cess in the different subjects is a problem s 
ond only to that of his general success in the 
institution; and his general success is closel\ 
tied up with, and often deperdent upon, hi 
success in the different subjects. This 
especially so in schools where many prerequ 
sites for graduation exist in the different 
partments and curricula. Tests and measures 
thus far devised correlate higher with 
eral success than with marks in the separate 
subject fields. The labor involved in the sta 
tistical processes necessary for prediction 
the separate fields has also been a deterrenc: 
to investigations here. However, a technique 
has recently been developed by which the pre 
diction of differences in a student’s achie: 
ment in the different subject fields is great 
facilitated. David Segel is the author of this 
procedure (11, pp. 76-89), a brief descriptior 
of which follows: 


Segel’s formula for finding the correlatior 
between a predicting agent and the differet 
of marks received in two subject fields is 


VaxOg — VyxPn 
r x — i ‘ 


~ ~ where (a 
V Ga tT Opn — 2% anOaOn 


is one subject or field and (6) another, and 
(x) is the predicting agent. This is simply 
an expansion of the regression r,,, with 
(a — b) taking the place of x, and can easily 
be shown to produce the same results as if 
the individual differences in marks in the two 
subjects were found and these correlated with 
the x-scores. Segel demonstrated this. F: 

example, where (a) is mathematics and () 

is science, and (x) is the Ohio State Univer- 
sity Psychological Examination; and sigma 
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is 1.11, sigma (0) 1.07 and sigma (x) 
and the correlations of (a) and (x) is 
of (6) and (x) .364, and of (a) and 
714, then f,a__»)x 
561 X 1.11 — .364 & 
1.117 + 1.077 — 2(.714 X I.II 
283 This means that students with 
higher Ohio State University Psychological 
lest Scores will make higher marks in mathe- 
iatics than in science, while those with low 
scores in Ohio State University Test may do 
is well in science as in mathematics, or pos- 
sibly, better. The regression equation for the 
prediction of (@—b6) from (X) is Xy=—= 
ee au Bi) a. 
Ox 
/) is the difference between the marks in two 
subject fields, M, is the mean of the predict- 
g agent and M,, », the difference in the 
means of marks in the subject fields. Using 
the data in the example above, in which sigma 
b) is .825, and M, is 72.84, M, is 3.54, 
ci 
1 M,, is 3.17, Xa 283-5 (X — 72.84) 
20.99 
37 == .00904X, — .315. The formula for 
robable error of estimate is PE,..:, 


1.07 
107) 


»), where 


” x (1 anes rex), in 
is the reliability of the X-variable 
or predicting agent. In our example, 


6745 \ [(.283) (.825)]*X (a—.92) 


b)x Fa b 


1? 
The technique described above was used in 
this investigation in an effort to determine 
how well Ohio State University Psychologi- 
i1 Examination scores, American Council 
lest scores and high school average marks can 
predict differences in marks to be achieved in 
the first semester of the college freshman year, 
in the different subject fields. All the data 
needed for this phase of the study are found 
in Table I. From these data the correlations 
shown in Table XII were calculated. 

\ brief study of Table XII reveals the fact 
that correlations of high school average marks 
and differences in subject field marks are 
higher, in general, than are correlations of the 
mental tests with differences in subject field 
marks. This would seem to indicate that 
high school average is superior to mental 
tests in the differentiation of student achieve- 
ment in the subject fields. In practice, how- 
ever, this does not prove to be so, chiefly be- 
cause of the unreliability of high school 
marks, as compared with that of the mental 


369 
tests. The reliability of high school marks, 
as used in this study, is estimated at .60, 
while that of both mental tests is .93 or above. 
This causes the probable error of estimate to 
be much higher, in most cases, for equations 
in which high school marks is a factor than 
for those involving the mental tests. This 
situation stresses the supreme importance of 
reliability in predictive studies. A relatively 
high correlation coefficient counts for but lit- 
tle, if the probable error of estimate is also 
high. 

It may also be observed from Table XII 
that correlations of subject differences with 
the two mental tests are, in general, about the 
same, except where foreign language is a mem- 
ber of the prediction pair. In that case the 
results are quite different. The Ohio State 
University Psychological Examination exalts 
foreign language to a place slightly above 
mathematics and far above English, science 
and social science, whereas the American 
Council Test rates mathematics far above for- 
eign language, and places foreign language on 
about equal terms with English, science, and 
social science. It may also be noted that the 
probable errors of estimate are, in general 
about the same for the two mental tests. 

From data given in Tables I and XII, four- 
teen multiple correlations were run, in an ef- 
fort to find the most efficient combination of 
factors for the predictive differentiation of 
student achievement in the subject fields 
The six highest of these correlations, together 
with their probable errors of estimate and re- 
gression equations are shown in Table XIIT. 

Only two of the correlations in Table XIII 
are above .so, and each of these contains high 
school marks as a factor in the prediction 
team. It may be observed again that the 
probable error of estimate is larger in the 
equations involving high schools marks (X,) 
as a factor in the predicting team. This fact 
again offsets the seeming advantage of higher 
correlations in the first two prediction pairs 
in the table, making the predictive value of 
those equations very small. 

From Tables XII and XIII, the following 
regression situations were chosen for predic- 
tion in this phase of the study: Only those 
equations with r or R above .150 were used. 
The writer is aware of the fact that only two 
of these correlations are above the minimum 
usually set for predictive purposes. But, 
since the chief purpose of this study is to de- 
termine the predictive value of the measures 
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TABLE XII 


CORRELATIONS, PROBABLE ERRORS, AND PROBABLE ERRORS OF ESTIMATE OF OHIO STATE UNIveR 
SITY PSYCHOLOGICAL EXAMINATION SCORES, AMERICAN COUNCIL TEST SCORES, AND HIGH 


ScHOOL AVERAGE MARKS WITH DIFFERENCES 
FOR THE FIRST SBMESTER, 1934-1935 

Ohio 
Prediction Psy. 
Pairs Test 
Math.—Eng." ; 7 = .209 
Er g£ Se - 122 
Eng.—F. Language as 171 
Eng.—Soc. Science iti .040 
Math.—Scl. re ee 280 
Math.—F.. Language F -_ —.002 
Math.—Soc. Science —_--_-- =e aa 
Sci.—F. Language —-- wwe —— OOO 
Sci.—Soc. Science —.-- ~~. Soa —.061 
F. Lang.—S. Science ca ae re esa .180 


IN COLLEGE MARKS IN THE SUBJECT Fipy 


PE: & H.S. PE-& A.C. E. PE-é 
PEest. AV. PE. Test P} 

+ .099 .239 O82 -209 T° 
+ .025' .067 + 095 
+ 067 ATi O71 —.015 On 
+ .021 .069 002 
+ .065 409 059 085 + 067 
+ 026 .146 0] 

+ .067 .244 .062 .033 + O68 
+ .031 096 + 005 
+ .098 91 O89 172 if 
+ 055 138 22 
+ 106 .o8D O89 196 

~ .001 .190 + 002 
+ O89 .ot7 .O89 181 

+ .031 .164 ~ O09 
+ 063 .130 .066 101 + 067 
+ .041 .054 = 025 
+ 067 067 074 047 + 068 
+ .011 057 + .02 

+ .065 —.061 .078 —.041 ~ .069 
+ .032 .065 + 008 


*The first subject of the pair is a and the second b, as later used in the expression 


Tia 


x. 
"The lower number in each case is the probable error of estimate. 


TABLE XIII 


MULTIPLE CORRELATION COEFFICIENTS, WITH PROBABLE ERRORS OF ESTIMATE, AND REGRESSION 
EQUATIONS FOR PREDICTION PAIRS IN CERTAIN SUBJECT FIELDS 





Prediction PE of 

Pairs R Estimate Regression Equation 
Math.—Soc. Sci. ..._............. .521 + .249 Xa = .0023X, + .0021X, + .613X 1.69 
Eng.—F. Lang. - ee ee a + .256 Xa = .0239X, + .0069X, + .884X, — 2.08 
Math.—Science - ale .405 +.157 Xa= .003X; + .0009X: + .455X,; — 1.72 
Math.—English - a +.130 Xa = .0005X, + .0019X. + .18X;,; — .767 
Math.—Science ______..__..._-_- .468 + .090 Xa = .0164X, — .0021X. — .48 
Math.—English _- eS . _ + 028 Xa= .0024X, + .0024X, — .29 


used, and not to pass judgment upon the ade- 
quacy of such value, a relatively large number 
of correlations and regression equations have 
been included for illustrative and comparative 
purposes: 

1. The correlation of mathematics minus 
social science with Ohio State University Psy- 
chological Examination scores is .171, with a 
probable error of estimate of + .031. The 
regression equation is Xy== .0084X, + .158. 

2. The correlation of mathematics minus 
social science with American Council Test 
scores is .181, with a probable error of esti- 
mate of + .oo9. The regression equation is 
Xa == .0037X, + .161. 


3. The correlation of mathematics minus 
social science with high school average marks 
is .377, with a probable error of estimate of 
+ .164. The regression equation is X 
.651X, — 1.74. 

4. The correlation of English minus for- 
eign language with Ohio State University Psy- 


chological Examination scores is — .171, with 
a probable error of estimate of + .026. The 
regression equation is Xqg == — .oo68X, + 
-575- 


5. The correlation of English minus for- 
eign language with American Council Test 
scores is .085, with a probable error of esti- 


ryt 
pr 


re 


Po 


» 
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mate of + .o11. The regression equation is 
0014X,— .15. 

6. The correlation of English minus for- 
eign language with high school average marks 
s .409, with a probable error of estimate of 

146. The regression equation is X,4= 
575X,— 2-14. 

7. The correlation of mathematics minus 
English with Ohio State University Psycho- 
logical Examination scores is .209, with a 
probable error of estimate of + .025. The 
regression equation is X,—= .0067X, — .208. 

8 The correlation of mathematics minus 
English with American Council Test scores is 
233, with a probable error of estimate of 
The regression equation is X, 

3X, — .214. 

9. The correlation of mathematics minus 
English with high school average marks is 
239, with a probable error of estimate of 

67. The regression equation is Xy— 
270X, — .765. 

10. The correlation of mathematics minus 
science with Ohio State University Psycho- 
logical Examination scores is .283, with a 
probable error of estimate of + .o55. The 
regression equation is Xq4== .0094X, — .315. 

11. The correlation of mathematics minus 


025. 


science with American Council Test scores is 


172, with a probable error of estimate of 
+ .022. The regression equation is X,— 
0028X, — .og!. 

12. The correlation of mathematics minus 
science with high school average marks is 
391, with a probable error of estimate of 

.138. The regression equation is X,- 
542X, — 1.717. 

13. The correlation of foreign language 
minus social science with Ohio State Univer- 
sity Psychological Examination is .180, with 
a probable error of estimate of + .032. The 
regression equation is X4 == .0085X, — .209. 

14. The correlation of foreign language 
minus science with Ohio State University Psy- 
chological Examination is .250, with a prob- 
able error of estimate of + .o41. The regres- 
sion equation is Xy == .0114X, — .820. 

15. The correlation of mathematics minus 
foreign language with American Council Test 
scores is .196, with a probable error of esti- 
mate of + .o0o2. The regression equation is 
X4 = .0045X, — .37. 

16. The correlation of mathematics minus 
foreign language with high school average 
marks is .385, with a probable error of esti- 
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mate of .190. The regression equation is 
Ag== 751k 2.540. 

17. The correlation of mathematics minus 
science with Ohio State University Psycho- 
logical Examination and American 
Council Test scores is .468, with a probable 
error of estimate of .ogo. The regression 
equation is X, 0164X, 0021X 48. 

18. The correlation of mathematics minus 
English with Ohio State University Psycho- 
logical Examination scores and American 
Council Test scores is .261, with a probable 
error of estimate of + .o28. The regression 
equation is X, .0024X, + .0024X, — .29. 

19. The correlation of mathematics minus 
social science with Ohio State University Psy- 
chological Examination scores, American 
Council Test scores and high school average 
marks is .521, with a probable error of esti- 
mate of .249. The regression equation is 
Xq = .0023X, + .0021X, + .613X, — 1.69. 

20. The correlation of English minus for- 
eign language with Ohio State University Psy- 
chological Examination scores, American 
Council Test scores, and high school average 
marks is .654, with a probable error of esti- 
mate of + .256. The regression equation is 
Xa = .0329X, + .0069X, + .884X, — 2.08. 

With each of these equations the differences 
in marks between the pair of subjects indi- 
cated were predicted for each student and the 
mean of these predicted differences found. 
This mean was used as the basis for the de- 
termination of the reliability or efficiency of 
the predictions. If the predicted difference 
of a student in a given pair of subjects varies 
as much as four probable errors of estimate 
from the mean his score is considered en- 
tirely reliable. If his predicted difference is 
two probable errors away from the mean his 
score is eighty-two per cent reliable, that is, 
the chances are eighty-two in a hundred that 
the difference between those two subjects will 
be in the same direction for other students 
who have the same marks in the predicting 
variable or variables, as this student has. 
This technique was adopted from Segel (11, 
p. 85). 

The results of these predictions are shown 
in Table XIV. It may be observed that 
wherever High School Average marks (X,) is 
the predicting agent, or a member of the pre- 
dicting team, the predictive efficiency is low, 
compared with that of the mental tests. It 
may also be noted that there is little to choose 
between the predictive efficiency of the Ohio 
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State University Psychological Examination 
and the American Council Test in this phase 
of the study. 

It is somewhat difficult to generalize on pre- 
dictive efficiency, as between pairs of subjects 
listed in Table XIV. Wherever mathematics 
or social science is a member of the prediction 
pair the discrimination in achievement is, in 
general, best. This is especially true of 
mathematics. A somewhat surprising dis- 
crimination appears between English and for- 
eign language, the efficiency of prediction be- 
ing almost as great between these two subject 
fields as between Mathematics and English. 
Taking the whole of Table XIV into consid- 
eration, one might generalize by saying that 
the subject fields, ranked from highest to 
lowest on the basis of the chances a student 
has to succeed in them, take the following 
order: Mathematics, English, Foreign Lan- 
guage, Science, and Social Science. 

The practical value of these findings is 
somewhat obscure. The last column of Table 
XIV shows that in only four of the equations 
are predictions entirely reliable in more than 
sixty per cent of the cases. There is a con- 
siderable probability that a difference exists 
between the subject pairs in the directions in- 
dicated, in about five-sixths, or eighty-three 
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per cent of the cases on average, if the equa- 
tions in which high school marks occur are 
omitted. This may possibly be of some valye 
in advising with students about these subject: 
when they are prerequisite to certain curri. 
cula that are basic to given vocations. 


V. SUMMARY 

The purpose of this study was to deter- 
mine the efficiency of certain measures an 
combinations of measures in the prediction 0; 
college freshman marks, no attempt being 
made to place an absolute evaluation upo: 
the determined efficiency, or to increase it 

Five variables were used as general predic- 
tive agents. They are the Ohio State Uni- 
versity Psychological Examination, Form 18 
the American Council Psychological Test 
high school average marks; Purdue Place- 
ment Test in English; and first semester col- 
lege freshman marks. In addition to these 
general measures, separate marks in the five 
subject fields of English, mathematics, science 
foreign language, and social science, on both 
high school and college levels, were included 

Intercorrelations among these measures 
were calculated, furnishing a basis for certain 
direct deductions, as well as data for multiple 
correlations and regression equations, and 


TABLE XIV 


AVERAGE PREDICTED DIFFERENCES OF COLLEGE FRESHMAN MARKS IN THE SUBJECT FIELDS, AND 
THE PERCENTAGE OF STUDENTS VARYING AS MUCH AS Two oR FouR PROBABLE ERRORS 
OF ESTIMATE FROM THE AVERAGE IN THE TWENTY REGRESSION SITUATIONS 


Prediction 


-air of Regression 
Situation 


Subjects 


Math.—S. Sci. ee ee ace 1 
Math.—S. Sci. Se ae ea ee rece 2 
Math.—S. Sci. Seen ecaaheace nate —_ 3 
F. Lang.—Eng. -............-- ns 4 
Eng.—F. Lang. ieee alae 5 
SE i 6 
ES ea eae 7 
Math.—Eng. _._.....____.__-_ a 8 
a <5 if) 
ES er ace 10 
Math.—Sci. ..__.___-__- 11 
Math.—Sci. _- ee See 12 
F’. Lang.—S. Sci. _......___ at pies 13 
Te ES |. ener 14 
Re. SN cc anmcocons 15 
Math.—F. Lang. -._______________ 16 


| ll 17 


Math.—Eng. ................ wok 18 
Math.—-S. Sci. Pa ne 5, ee 19 
MeOtNenms. Bohs. dc dnc ccecencunc. 20 


* These are eighty-two per cent reliable. 
’ These are entirely reliable. 


Per Cent of Per Cent of 


Av. Pre- Pupils Two PE* Pupils Fou 
dicted Diff. from Average PE” from Av 
.93 79 69 
.95 94 94 
.78 53 16 
35 80 60 
.05 81 60 
.69 46 9 
36 yi | 53 
38 R3 60 
35 29 00 
44 79 60 
.44 78 60 
.46 37 5 
44 90 69 
.35 75 54 
67 100 100 
.64 47 00 
.69 70 45 
38 91 56 
91 45 00 
45 20 00 
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supplying materials for differential predictions 
among the subject fields. 

From a larger number of regression equa- 
tions developed, ten were used for the predic- 
tion of college freshman marks. The pre- 
dicted marks were analyzed (a) by compar- 
ing the averages and sigmas of predicted 
marks with the averages and sigmas of cor- 
responding achieved marks; (b) by noting 
the difference between predicted and achieved 
marks for each student, and finding the aver- 
age deviation and range of difference between 
the two; (c) by throwing the one hundred 
forty students of the study into a decile distri- 
bution for both predicted and achieved marks, 
each student being identified both individually 
and by sex; and (d) by comparing predicted 
and achieved marks with reference to a criti- 
cal point, and noting the efficiency of predic- 
tion with reference to that point. 

VI. 

1. There is little to choose between the pre- 
dictive efficiency of high school average marks 
and Ohio State University Psychological Ex- 
amination scores, when no distinction of sexes 
is made. High school marks predict college 
marks for boys better than for girls; Ohio 
State University Psychological Examination 
scores predict marks better for boys in the 
lower part of the curve, while the reverse is 
true in the upper part. Prediction by the 
\merican Council Test is inferior to that of 
the high school average or Ohio State Uni- 
versity Psychological Examination; but the 
order of efficiency as between the sexes and as 
between the upper and lower parts of the 
curve, is the same as that of the Ohio State 
University Psychological Examination. 

2. Analysis of correlations between specific 
high school subjects or subject fields and cor- 
responding college subjects or subject fields 
yields little evidence to support the traditional 
practice of demanding prerequisites or credit- 
patterns in high school as essential to success 
in college. 

3. Relatively greater accuracy in the upper 
and lower deciles of a ranked distribution, 
sensed but not explained by Whipple (15, p. 
262ff), permits the practical use of smaller 
correlations for prediction than traditional 
thought has sanctioned. 

4. Given a multiple correlation and a zero- 
order correlation approximately equal in size 
(with no statistically significant difference be- 
tween the two), this study offers evidence that 
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the multiple correlation is the more reliable 
for purposes of prediction. 

5. There is evidence that none of the gen- 
eral measures used in this study has equal 
predictive efficiency for both sexes, the men- 
tal tests relatively over-rating the boys and 
high school marks relatively over-rating the 
girls. When combined, high school marks 
and mental tests tend to counteract each other 
and to yield more uniform results with the 
sexes. 

6. It is possible, and perhaps desirable, to 
determine the “critical point” in the marking 
system of a given college or university, that is, 
the point below which a freshman may not 
fall and have as much as fifty chances in one 
hundred for graduation. The mark in the 
distribution one probable error below the 
“critical point” is the minimum mark a stu- 
dent may receive and retain any hope for 
graduation. This may be called the “fatal 
point” in the marking-system of the school. 
Also, the mark one probable error above the 
“critical point” is the minimum mark a stu- 
dent may receive and retain any fear of fail- 
ure in the institution, according to the find- 
ings of this study. This point may be termed 
the “safety point” in the marking system of 
the school. 


7. In differential prediction of subject field 
marks, reliability of the predicting agent is of 


paramount importance. ‘This is demonstrated 
and emphasized by the fact that mental tests, 
with relatively high coefficients of reliability, 
are, in spite of their generalized nature and 
relatively low correlations with subject differ- 
ences, more efficient in prediction than are 
high school marks with relatively high corre- 
lations with differences in subject field marks, 
but with low coefficients of reliability, accord- 
ing to the technique of differential prediction 
used in this study. 


VIL. 


Although absolute evaluation of the meas- 
ures used is not a major objective in this 
study, a word might be said about the prob- 
able practical use of the findings. 

(a) The widely varying degrees of predic- 
tive efficiency as between the sexes, noted in 
each of the regressions used, should be of 
some service in warning administrators and 
personnel workers against the practice of 
using the same predictive agent for the whole 
student body, upon which grave decisions are 
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made that are vital to the future welfare of 
the student. 

(b) The decile technique used in this study 
to analyze the results of prediction, both as 
between the sexes and the abler and weaker 
students, or perhaps a similar device such as 
a quintile distribution based upon a five-point 
grading-system, may possibly be of service in 
enabling the research work in this field to 
make the results of his investigations more 
concrete and practical. 


(c) Demonstration of the superiority of 
multiple correlations over that of zero-order 
correlations in the assignment of more uni- 
form rankings to a given student should serve 
as a warning against the practice of sectioning 
or pigeon-holing students on the results of 
predictions from a single variable. 

(d) The differences in predictive efficiency 
among the general predictive agents used in 
this study, while not entirely significant sta- 
tistically. may possibly be of some value to 
administrators and personnel workers in se- 
lecting materials for the evaluation and guid- 
ance of their students. 


(e) Demonstration of the relative efficiency 
of comparatively low correlations in the pre- 
diction of the ranks of students in the upper 
and lower deciles of the group may tend 
towards a readjustment in our thinking as to 
the minimum correlation that may be of serv- 
ice in college administration. If it will be of 
service to administrators and personnel work- 
ers to know at the time of enrollment, or soon 
thereafter, the approximate rank that seven to 
eight-tenths of the students will attain in their 
college work, then this will be so. 


(f{) The fact that the Ohio State Univer- 
sity Psychological Examination rates mathe- 
matics and foreign language far above the 
other subject fields of English, science and 
social science, and that there is a wide differ- 
ence in the relative rating of foreign language 
by the Ohio State University Psychological 
Examination and the American Council Test, 
as revealed by the differential prediction tech- 
nique used in this study, suggest the possibil- 
ity of using this technique with profit for the 
validation of tests. 


VIII. Questions ror FuRTHER Stupy 


1. Why do girls exceed and boys fall short 
of predictions from mental tests? 
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Why do boys exceed and girls fall short 
predictions from high school marks? 


. How can the reliability of college marks be 


raised ? 


. How can character traits be scientificall, 


included in prediction teams? 


. Why are a student’s rankings predicted by 


regressions based upon multiple correla- 
tions more uniform than are his rankings 
predicted by regressions based upon zero- 
order correlations of about the same 
magnitude? 
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THE RELATIONSHIP BETWEEN THE TYPE OF QUESTION 
AND SCORING ERRORS 


Jack W 


7. DUNLAP 


Fordham University 


A point often advanced in favor of objec- 
tive tests is their freedom from error in scor- 
ing. Nevertheless, that scores on objective 
tests are frequently in error has been shown 
by Pintner,° Dearborn and Smith,* and 
Herbst.* Several techniques have been pro- 
posed to overcome scoring errors: notably the 
self-scoring tests of Clapp and Young; print- 
ing the correct response on the answer sheet 
after the student has recorded his choice 
(proposed by Toops); mimeographing the 
correct answers on the answer sheet (a mod- 
ification of oops’ method proposed by 
Cuff); the Testometer, a mechanical device 
for determining the score (designed by 
Cuff?); and such automatic scoring devices 
as the Perfo-scorer, the Thermo-scorer, and 
the Chemo-scorer (developed by Peterson 
and Peterson’). 

Some of the causes for errors in scoring 
are: carelessness in marking the response, mis- 
takes in counting the number of items marked 
right, wrong, or omitted, errors in arithmetic, 
errors in transmuting scores, and errors in 
transferring scores. It has been suspected for 
a long time that an important cause of error 
in scoring is the form in which the questions 
are presented, but so far as the writer knows, 
little attention has been given to the problem, 
at least in the literature. 

This paper is concerned with the factor of 
type or form in which the question is pre- 
sented and its relation to scoring errors. The 
basic question is; are certain types of test 
questions more subject to scoring errors than 
are others? By type is meant the mechan- 
ical form of presenting the question and 
recording the response. Another question on 
which these data may throw some light is; do 
scorers tend to underscore or overscore a 
test, that is, tend to give insufficient credit or 
to give undue credit? 

Three hundred ninety-eight Terman Group 
Tests of Mental Ability scored and rescored 
for another purpose furnish the basic data 
for this study*. The Terman Group Test of 

*The writer is indebted to Mr. A. Kroll, of Benjamin 


Franklin High School, New York City, for making these 
data available. 


Mental Ability has ten subtests. This set oj 
papers was analyzed to determine the num- 
ber of papers having scoring errors for each 
of such errors occurring in each subtest. The 
papers were scored by thirty teachers under 
supervision. Three or four, and occasionally 
more, teachers scored each subtest. When a 
group was particularly slow in scoring a sub- 
test, unoccupied scorers were asked to assist 
Slowness in scoring a particular test may have 
been due to any one or a combination of 
causes, such as length of the test, inherent 
difficulty in scoring that test, or to the slow- 
ness of that particular group of scorers. The 
data secured by this routine of scoring are not 
ideal for this purpose, since it may be con- 
tended that if a given teacher is prone to one 
type of error, the subtest she graded would 
show a disproportionate number of errors of 
that type. The ideal situation would have 
been for each teacher to have scored the same 
number of blanks for all subtests. It may be 
assumed, however, since three, four, or more 
teachers scored each subtest, that such con- 
stant errors tend to cancel out from one test 
to another. Even if this be false, it is worth- 
while to examine the data as to the relation- 
ship between errors and types of questions. 


An examination of Table I reveals that the 
number of papers having errors varies from 
11 in Test Nine to 116 in Test Three. The 
number of subtests originally overscored is 
236, while the number underscored is 208. 
Thus, out of 3980 subtests, 534 or 13% were 
in error. Since the number of items varies 
from test to test, the likelihood of an error of 
scoring occurring, other things being equal, 
will be greatest in the longer tests. For pur- 
poses of comparison, therefore, the tests have 
been equated to a common base of twenty 
items. The data after equating are shown in 
the bottom part of Table I. The number of 
subtests having scoring errors after adjust- 
ment of the tests to a common length is 555. 
Thus, out of 3980 subtests, fourteen per cent 
are in error. The number of test papers hav- 
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ERRORS 


TABLE I 


THE NUMBER OF PAPERS IN ERROR BY SUBTEST WHEN RESCORED 


A Pius INDICATES PAPERS ORIGINALLY OVERSCORED; A MINUS, PAPERS UNDERSCORED. 


Subtests 


3 5 


50 7 
66 15 
116 22 
10 3 


8 
38 
5 
11 


30 12 


Number of Papers in Error 
55 33 12 


15 44 25 
70 77 37 


18 

9 

27 

% Papers 
In Error 
Rank 


6.8 17 


7.6 
2 7 


19.% 


ing errors varies from 12 in Test Nine to 103 
n Test Eight. 


The first question to arise is; can the varia- 
tion between the various subtests be ascribed 
to chance? The simplest method for testing 
this is the method of chi square*. Since the 
total number of subtests in error is 555, and 
these fall into ten classes, the best estimate 
of the number to be expected if only chance 
is operating to cause errors in scoring is the 
average, namely, 55.5. There are nine de- 
grees of freedom here, and the value of chi 
square is 128.95. The one percent point with 
nine degrees of freedom is 21.666, so that 
there can be no question that the difference 
is due to some factor other than chance. 


The total number of scoring errors is shown 
by subtest in Table II. Again, for the pur- 
poses of comparison, the tests have been 
equated to a common length of twenty items. 
The total number of errors ranges from 15 
for Test Nine to 254 for Test Three. The 
last two lines of Table II show the average 
and median number of errors occurring by 
subtests for those blanks having errors. The 
averages range from 1.4 per paper in Test 
Nine to 6.3 in Test Two. Test Two is a 
best-answer test, where one phrase in a series 


*The method of chi square is old, and although it is an 
extremely important method, it has not been widely 
For the convenience of the reader who has forgotten or is 
unfamiliar with the method, the following references are given: 
Garrett, H. E. Statistics in Psychology and Education, 
Longmans Goon and Co., 1937, Pp. 119-124. 

co. tee . Psy chometric Methods, McGraw Hill Book 
Co., 1936 v 92-93, 180-181. 

Holzin K. J. Statistical methods i Students in Edu- 
cation, Ginn and Co., 1928, Pp. 245-24 


Total 
Subtests 
In Error 

236 
298 
534 


6 7 
17 27 
48 40 
65 67 

6 7 
24 20 


18 18 


if All Tests Had 20 Items 


14 27 46 
40 40 57 
54 67 103 1 


4 
g 
2 


13.6 16.8 


5 6 


of three is marked. It is unbelievable that 
6.3 items out of twenty, of 31.5 per cent, 
were misscored. The median, however, is 2.0 
or 10 per cent. This extremely high average 
may be due to error of a single scorer but in 
opposition to this hypothesis is the fact that 
other tests of a similar form, namely Test Five 
and Test Ten also show undue error in 
scoring. 

That the differences in number of errors 
from subtest to subtest is not ascribable to 
chance is shown by the chi square value of 
480.5. Again with nine degrees of freedom, 
the value of chi square for the one per cent 
level is 21.666; so there can be no question 
that some factor other than chance 
operating. 

In Table III, the subtests are classified as 
to type of question. Tests Three, Six, and 
Eight are two-choice tests, respectively yes—no, 
same—opposite, and true-false. Tests One, 
Seven and Nine are multiple choice tests 
where the student underlines a single word in 
the text. Test four is also a multiple choice 
test, but the subject must underline two 
words in the text. In the last three Tests, 
Two, Five, and Ten, the correct response is 
written on the margin of the test. Under the 
heading “papers in error” is given the rank 
of the test in terms of the raw data and the 
rank after the tests have been adjusted to a 
common length. The correlation between 
these two ranks is .g1. When these tests are 
adjusted to a common length, it is readily 
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TABLE II 


THE TOTAL AMOUNT OF ERROR BY SUBTEST, THE TOTAL ERROR BY SUBTEST WHEN THE Test: 
ARE EQUATED TO TWENTY ITEMS, AND THE AVERAGE ERROR FOR PAPERS HAVING ERRORS 


Subtest 


l 2 3 4 5 
Papers 
In Error 27 38 116 V7 22 


30 8 136 49 24 


23 52 245 55 29 
r 53 132 381 104 53 
Rank 3.5 7 10 6 3.5 


Number of errors if each test had 20 items 


+ 30 146 91 49 40 





23 94 163 55 48 

T 58 240 254 104 88 

tank 2 9 10 6 4 
Average Error 2.0 6.3 2.2 14 4.0 
Median Error 1.0 2.0 2.0 1.0 1.0 


6 7 8 9 10 
65 67 92 11 19 
34 41 76 6 30 
139 56 125 : j 16 
173 96 201 13 46 
8 5 9 1 2 
Total Err 
28 41 85 7 50 567 
106 56 139 8 27 719 
134 97 224 15 77 128.6 
7 5 8 1 3 
ie | 1.5 2.4 1.4 4.0 
2.0 1.0 2.0 1.0 2.0 


TABLE III 


THE TESTS CLASSIFIED AS TO TYPE AND RANKED ACCORDING TO (a) NUMBER OF TESTS HAVING 
ERRORS AND (b) NUMBER OF ERRORS PER TEST FOR UNADJUSTED AND ADJUSTED DATA 


Papers 
Raw Data 
Type of Test Test Mean 
Rank Rank 
1. Two Choice, Indicating 3 10 
a Single Response on 6 6 9 
the Margin 8 9 
2. Multiple Choice, Indi- 1 4 
) eating a Single Word 7 7 4 
in the Text 9 1 
3. Multiple Choice, Indi- 
; cating Two Words in 4 8 8 
the Text 
4. Writing 2 5 
: Correct Response 5 3 3 
; on Margin 10 2 
- 


seen that the multiple choice tests where a 
single word in the text is indicated as the 
answer show the fewest errors, closely fol- 
lowed by the tests where a single correct 
response is written in the margin. 

The most difficult test to score is the one 
in which a choice of two answers is indicated 
on the margin, as same—opposite, yes—no, or 
true—false. Equally difficult to score is the 
test where two words must be underlined in 
the test. The mean number of papers in 
error by type of test are 35 for type two 
(multiple choice, indicating a single word in 
the text), 46 for type four (writing correct 
response in margin), 78 for type one (two 
choice indicating a single response on the 


in Error Errors Per Test 
Corrected Data RawData Corrected Data 
Mean Mean Mean 
Rank Rank Rank Rank Rank Rank 
8.5 10 10 
5 8.5 8 9 | 8 
10 9 8 
2 3.5 2 
6 2 5 3.5 5 2 
1 1 
8.5 8.5 6 6 5 6 
7 7 9 
4 4 3.5 3.5 4 t 
3 2 3 


margin), and 77 for type three (multiple 
choice, indicating two words in the text. 

The size of the samples of types of tests 
are so small that further statistical refinement 
is not necessary, but the evidence seems be- 
yond cavil that tests 3, 4, 6, and 8 represent 
types that are more difficult to score than 
are the other types studied. 

The question arises, do scorers tend to 
underscore or overscore? Since the number 
of errors in the adjusted series is 1286, half 
of these should be positive and half should 
be negative if chance alone determined the 
distribution. It cannot, however, be assumed 
that these are spread equally throughout the 
ten tests, which would give twenty categories 
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TABLE IV 


THe NUMBER OF I.Q.’s THAT WERE UNDER- AND OVER-ESTIMATED TOGETHER WITH THE 
MAGNITUDE OF THE ERRORS 


9-16 
12 
») 


znd nineteen degrees of freedom, since it has 
been shown that chance alone does not oper- 
ate in the distribution from subtest to sub- 
test. It is, however, legitimate to assume 
that within a test the errors should be equally 
distributed. The total number of positive 
errors is 567 and the number of negative 
errors 719. Computing chi square on this as- 
sumption gives a value of 102.12, which is far 
in excess of the one per cent level with nine- 
teen degrees of freedom, namely 36.191. 
There is a decided tendency for this group to 
underscore rather than to overscore. 


Since these data are based on an intelli- 
gence test, it seems appropriate to determine 
the number of students whose I. Q. was in- 
correctly determined. Table IV shows the 
number of students whose I. Q. was under-or 
over-estimated, together with the size of the 
error. The two hundred twenty-five cases 
that had errors in the I. Q. are due almost 
wholly to errors in scoring. Only ten of the 
errors, 2.59%, were due to mistakes in deter- 
mining the chronological age, which was com- 
puted by means of a table. It should be 
noted that there are 47 cases, 11.8% of all 
cases considered, where the I. Q. is in error 
by nine points or more. Thus, one child in 
eight, approximately, is misplaced by nine or 
more points of I. Q. There seems to be no 
question that tests on which I. Q.’s are to be 
determined should be rescored. 

In conclusion, it appears that some types 


7-24 


41-48 
» 
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of questions are more likely to be misscored 
than others. This seems particularly true of 
the true—false, yes-no, and same—opposite 
tests where the subject is required to under- 
line one of the terms, and for tests where the 
subject underlines two words in the text. If 
this is not an artifact of the particular group 
of scorers used, and if the use of items in 
this form should not be discontinued, either 
the tests must be rescored or some mechan- 
ical method of scoring must be utilized. 
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